UNI-MB - logo
UMNIK - logo
 
(UM)
  • Large vocabulary speech recognition of Slovenian language using morphological models
    Sepesy Maučec, Mirjam ...
    This paper concerns the development of automatic speech recognition system for Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for ... performance degradation. This article discusses the statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modelling the Slovenian language. We as well discuss corpus-based topic-adapted language models. Language models are most often used in topic homogeneous environment. The problem of topic detection in highly inflected language is outlined, caused by appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.
    Type of material - conference contribution ; adult, serious
    Publish date - 2003
    Language - english
    COBISS.SI-ID - 8249110