UP - logo
FF, Osrednja humanistična knjižnica, Ljubljana (FFLJ)
  • Bilingual lexicon extraction from comparable corpora [Elektronski vir] : a comparative study
    Ljubešić, Nikola ...
    This paper presents a comparative study of the impact of the key parameters for bilingual lexicon extraction for nouns from comparable corpora. The parameters we analyzed are: corpus size and ... comparability, dictionary size andtype, feature selection for context vectors and window size, and association and similarity measures. Evaluation against the gold standard shows that window size of 7 with encoded position yields best results. The consistently best-performing association and similarity measures are Jensen-Shannon divergence with log-likelihood. We have shown that very good results can be achieved with small-sized but purpose-built seed lexicons and that problems arising from dissimilarities between the source and the target corpus can be compensated with their sufficient size.
    Vrsta gradiva - prispevek na konferenci
    Leto - 2011
    Jezik - angleški
    COBISS.SI-ID - 46846050