Akademska digitalna zbirka SLovenije - logo
VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
  • Unsupervised learning of multiword units from part-of-speech tagged corpora : does quantity mean quality?
    Dias, Gaël ; Vintar, Špela
    This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns ... that lead to the identification of well-known multiword units (mainly compound nouns), we automatically identify relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. Finally, we propose an exhaustive evaluation of our architecture based on the multi-domain, bilingual Slovene-English IJS-ELAN corpus where surprising results are evidenced. To ourknowledge, this challenge has never been attempted before.
    Vrsta gradiva - prispevek na konferenci ; neleposlovje za odrasle
    Leto - 2005
    Jezik - angleški
    COBISS.SI-ID - 31062626