UP - logo
University of Primorska University Library (UPUK)
  • Dataset of sentiment tagged language resources for Bosnian language [Elektronski vir]
    Jahić, Sead ; Vičič, Jernej
    The Bosnian language holds significant importance as a member of the West-South Slavic subgroup within the Slavic branch of the Indo-European linguistic family. With approximately 2.5 million ... speakers in Europe, including 1.87 million individuals in Bosnia and Herzegovina alone, the Bosnian language constitutes the mother tongue for a considerable portion of the population. In Natural Language Processing (NLP) tasks related to the Bosnian language, besides removing stop words, it is important to consider the influence of other linguistic elements. Bosnian text contains words derived from diminishers, relative intensifiers, minimizers, maximizers, boosters, and approximators. These words contribute to the overall meaning and sentiment analysis of the text. By including these elements in NLP models and algorithms, researchers can achieve more accurate and nuanced analysis of Bosnian language data, enhancing the effectiveness of NLP applications. The two lists of sentiment annotated words that present the core of the Bosnian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affrimative words (AnAwords) composed mostly of intensifiers and diminishers, were used to construct a dataset that presents the base for sentiment analysis in the Bosnian language.
    Source: Data in brief [Elektronski vir]. - ISSN 2352-3409 (Vol. 53, art. 110247, apr. 2024, str. 1-12)
    Type of material - e-article ; adult, serious
    Publish date - 2024
    Language - english
    COBISS.SI-ID - 189615619