ALL libraries (COBIB.SI union bibliographic/catalogue database)
  • Annotated news corpora and a lexicon for sentiment analysis in Slovene
    Bučar, Jože, 1985- ; Žnidaršič, Martin, 1978- ; Povh, Janez, 1973-
    In this study, we introduce Slovene web -crawled news corpora with sentiment annotation on three levels of granularity: sentence, paragraph and doc- ument levels. We describe the methodology and ... tools that were required for their construction. The corpora contain more than 250,000 document s with political, business, economic and financial content from five Slovene media resources on the web. More than 10,00 0 of them were manually annotated as negative, neutral or positive. All corpora are publicly available under a Creative Commons copyright license. We used the annotated documents to construct a Slovene sentiment lexicon, which is the first of its kind for Slovene, and to assess the sentiment classification approaches used. The constructed corpora were also utilised to monitor within-the- document sentiment dynamics, its changes over time and relations with news topics. We show that sentiment is, on average, more explicit at the begi nning of documents, and it loses sharpness towards the end of documents.
    Source: Language resources and evaluation. - ISSN 1574-020X (Vol. 52, iss. 3, 2018, str. 895-919)
    Type of material - article, component part ; adult, serious
    Publish date - 2018
    Language - english
    COBISS.SI-ID - 15875867