Manual sorting of numerals in an inflective language for language modelling

VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)

PDF

Manual sorting of numerals in an inflective language for language modelling

Donaj, Gregor ; Kačič, Zdravko

In speech recognition systems language models are used to estimate the probabilities of word sequences. In this paper special emphasis is given to numerals%words that express numbers. One reason for ... this is the fact that in a practical application a falsely recognized numeral can change important content information inside the sentence more than other types of errors. Standard TeX -gram language models can sometimes assign very different probabilities to different numerals, according to their relative frequencies in training corpus. Based on the assumption that some different numbers are more equally likely to occur, than what a standard TeX -gram language model estimates, this paper proposes several methods for sorting numerals into classes in an inflective language and language models based on these sorting techniques. We treat these classes as basic vocabulary units for the language model. We also expose the differences between the proposed language models and well known class-based language models. The presented approach is also transferable to other classes of words with similar properties, e.g. proper nouns. Results of experiments show that significant improvements are obtained on numeral-rich domains. Although numerals represent only a small portion of words in the test set, a relative reduction in word error rate of 1.4 % was achieved. Statistical significance tests were performed, which showed that these improvements are statistically significant. We also show that depending on the amount of numerals in a target domain the improvement in performance can grow up to 16 % relative.

Vir: International journal of speech technology. - ISSN 1381-2416 (Vol. 17, no. 3, 2014, str. 281-289)

Vrsta gradiva - članek, sestavni del

Leto - 2014

Jezik - angleški

COBISS.SI-ID - 17698070

DOI

Išči dalje

Zaloga po knjižnicah

vir: International journal of speech technology. - ISSN 1381-2416 (Vol. 17, no. 3, 2014, str. 281-289)

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS
Donaj, Gregor	33286
Kačič, Zdravko	06821

Vir: Osebne bibliografije in: SICRIS

Gradivo iz matične enote je brezplačno. Če je gradivo na mesto prevzema dostavljeno iz drugih enot, lahko knjižnica to storitev zaračuna.

Mesto prevzema	Status gradiva	Rezervacija

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Izberite prevzemno mesto:

Prevzem gradiva po pošti

Obvestilo

Citiranje

Gesla v Splošnem geslovniku COBISS

Izbira mesta prevzema

Rezervacija je uspela.

Rezervacija ni uspela.

Rezervacija...

Bibliografski podatki

Število izposoj

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Tema