NUK - logo
VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
  • Syntactic annotation of Slovene CMC [Elektronski vir] : first steps
    Arhar Holdt, Špela ...
    This paper presents the first steps towards the syntactic annotation of Slovene CMC, namely the annotation of 200 Slovene tweets with the JOS dependency model. After a presentation of the dataset we ... present the selected annotation model, the annotation procedure, and results. The focus of the paper is on the decisions regarding the annotation of CMC-specific elements that required special treatment: Twitter-specific features, foreign language elements, ellipsis and fragments, non-standard use of punctuation, and other non-standard language features. The dataset, together with the CMC-adapted annotation guidelines, can be used for further annotation of language data (from Twitter or other CMC genres), and in the second step to train a parser for the selected CMC domain(s). The large-scale corpus- based research of non-standard Slovene syntax, which will be facilitated by the described activities, will help disprove the myths surrounding CMC that are still present in the field of Slovene studies.
    Vrsta gradiva - prispevek na konferenci
    Leto - 2016
    Jezik - angleški
    COBISS.SI-ID - 62120546