Akademska digitalna zbirka SLovenije - logo
ALL libraries (COBIB.SI union bibliographic/catalogue database)
  • Multi-word discourse markers and their corpus-driven identification : the case of MWDM extraction from the reference corpus of spoken Slovene
    Dobrovoljc, Kaja
    With expanding evidence on the formulaic nature of human communication, there is a growing need to extend discourse marker research to functionally analogue multi-word expressions. In contrast to the ... common qualitative approaches to discourse marker identification in corpora, this paper presents a corpus-driven semi-automatic approach to identification of multi-word discourse markers (MWDMs) in the reference corpus of spoken Slovene. Using eight statistical measures, we identified 173 structurally fixed discourse-marking MWEs, distinguished by a high number of tokens, a large proportion of grammatical words and semantic heterogeneity. This is a significantly longer list than would have been gained by manual inspection of smaller corpus samples. Although frequency-based methods produced satisfactory results, best precision in MWDM identification was achieved using the t-score association measure, while the overall poor performance of the mutual information suggests its inadequacy for extraction of MWDMs and other MWEs with similar lexical and distributional features.
    Source: International journal of corpus linguistics. - ISSN 1384-6655 (Vol. 22, issue 4, 2017, str. 551-582)
    Type of material - article, component part ; adult, serious
    Publish date - 2017
    Language - english
    COBISS.SI-ID - 66144610