VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
  • Identifying false friends between closely related languages [Elektronski vir]
    Ljubešić, Nikola, 1979- ; Fišer, Darja, 1978-
    In this paper we present a corpus-based approach to automatic identification of false friends for Slovene and Croatian, a pair of closely related languages. By taking advantage of the lexical overlap ... between the two languages, we focus on measuring the difference in meaning between identicallyspelled words by using frequency and distributional information. Weanalyze the impact of corpora of different origin and size together with different association and similarity measures and compare them to a simple frequency-based baseline. With the best performing setting we obtain very goodaverage precision of 0.973 and 0.883 on different gold standards. The presented approach works on non-parallel datasets, is knowledge-lean and language-independent, which makes it attractive for natural language processing tasks that often lack the lexical resources and cannot afford to build them by hand.
    Vrsta gradiva - prispevek na konferenci
    Leto - 2013
    Jezik - angleški
    COBISS.SI-ID - 52673634