UNI-MB - logo
UMNIK - logo
 
(UL)
  • Adapting a state-of-the-art tagger for South Slavic languages to non-standard text [Elektronski vir]
    Erjavec, Tomaž, 1960- ; Ljubešić, Nikola, 1979- ; Fišer, Darja, 1978-
    In this paper we present the adaptations of a state-of-the-art tagger for South Slavic languages to non-standard texts on the example of the Slovene language. We investigate the impact of introducing ... in-domain training data as well as additional supervision through external resources or tools like word clusters and word normalization. We remove more than half of the error of the standard tagger when applied to nonstandard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error. The final configuration achieves tagging accuracy of 87.41% on the full morphosyntactic description, which is, nevertheless, still quite far from the accuracy of 94.27% achieved on standard text.
    Type of material - conference contribution
    Publish date - 2017
    Language - english
    COBISS.SI-ID - 64001634