VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
PDF
  • Manual sorting of numerals in an inflective language for language modelling
    Donaj, Gregor ; Kačič, Zdravko
    In speech recognition systems language models are used to estimate the probabilities of word sequences. In this paper special emphasis is given to numerals%words that express numbers. One reason for ... this is the fact that in a practical application a falsely recognized numeral can change important content information inside the sentence more than other types of errors. Standard TeX -gram language models can sometimes assign very different probabilities to different numerals, according to their relative frequencies in training corpus. Based on the assumption that some different numbers are more equally likely to occur, than what a standard TeX -gram language model estimates, this paper proposes several methods for sorting numerals into classes in an inflective language and language models based on these sorting techniques. We treat these classes as basic vocabulary units for the language model. We also expose the differences between the proposed language models and well known class-based language models. The presented approach is also transferable to other classes of words with similar properties, e.g. proper nouns. Results of experiments show that significant improvements are obtained on numeral-rich domains. Although numerals represent only a small portion of words in the test set, a relative reduction in word error rate of 1.4 % was achieved. Statistical significance tests were performed, which showed that these improvements are statistically significant. We also show that depending on the amount of numerals in a target domain the improvement in performance can grow up to 16 % relative.
    Vir: International journal of speech technology. - ISSN 1381-2416 (Vol. 17, no. 3, 2014, str. 281-289)
    Vrsta gradiva - članek, sestavni del
    Leto - 2014
    Jezik - angleški
    COBISS.SI-ID - 17698070
    DOI