Akademska digitalna zbirka SLovenije - logo
VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
  • Statistical language modeling based on automatic classification of words
    Sepesy Maučec, Mirjam
    In statistical language modeling the model's parameters are extracted from large amounts of text. This kind of models can be built for any language without requireing any linguistic knowledge. Bigram ... and trigram language models will be discussed. In statistical modeling there is always a problem of sparse data. We will compare two proposed solutions: smoothing method proposed by Katz and automatic word clustering proposed by Ney. In the first case, some probability mass is redistributed over bigrams (trigrams) which never occured in the text. In the second case, the words are mapped into classes in such a way that the perplexity of the model is minimized. By comparing word based models and class based models we see that the use of clustered words leads to a significant improvement, as measured by the perplexity.
    Vrsta gradiva - prispevek na konferenci
    Leto - 1998
    Jezik - angleški
    COBISS.SI-ID - 3943702