VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
  • Verjetnostno modeliranje slovenskega jezika
    Sepesy Maučec, Mirjam ; Kačič, Zdravko
    This paper presents a new framework to construct models of Slovenian language. The main differences between language modelling of Slovenian language and English language are pointed out. The effects ... of high inflectionality in Slovenian language are examined. Two important difficulties in Slovenian language are discussed, high out-of-vocabulary rate of standard word based models and the problem of topic detection as part of a language model adaptation. We define different basic units at different stagesof the language model construction. Basic language models use smaller morpheme-like lexical units. In contrast, topic detection requires larger units with emantic information. We propose the use of lemmalike classes. The techniques. The techniques for basic units selection are language independent.They can be applied to other languages, where words are formed by many different inflectional affixatations. Experiments on Slowenian newspaper-news corpus show the significant improvements of the proposed new models over standard word-based models.
    Vrsta gradiva - prispevek na konferenci
    Leto - 2001
    Jezik - slovenski
    COBISS.SI-ID - 6546454