Akademska digitalna zbirka SLovenije - logo
(UM)
  • Morphology in statistical machine translation
    Sepesy Maučec, Mirjam ; Brest, Janez
    In this paper we discuss statistical machine translation from more inflected language to less inflected one. Translation from Slovenian to English is used as an example of that type of translation. ... The focus is given on the morphological variation in source language, which is not reflected in the target language, but results in increased data sparsity. Morphological variation in source language is expressed using lemma-tag representation of words. Tag contains morpho-syntactic description of a word. The idea is to keep only the tags relevant for translation. Eliminating the rest of them results in data sparsity reduction. To determine the set of relevant tags expert knowledge is needed. We try to avoid it by using a global optimization algorithm. We choose a population based Differential Evolution algorithm. The experiments were carried out using freely available parallel English-Slovenian SVEZ-IJS corpus, which is lemmatised and annotated with morpho-syntactic description tags.
    Vrsta gradiva - prispevek na konferenci
    Leto - 2008
    Jezik - angleški
    COBISS.SI-ID - 12474134