DIKUL - logo
(UL)
  • Harvesting multi-word expressions from parallel corpora
    Vintar, Špela ; Fišer, Darja, 1978-
    The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multi-word expressions. In the first approach multi-word expressions from Princeton WordNet are ... translated with a technique that is based on word-alignment and lexico-syntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multi-word expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparisonof the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed.
    Type of material - conference contribution ; adult, serious
    Publish date - 2008
    Language - english
    COBISS.SI-ID - 37174626