-
Normalisation, tokenisation and sentence segmentation of Slovene tweets [Elektronski vir]Čibej, Jaka, prevodoslovje, računalništvo ; Fišer, Darja, 1978- ; Erjavec, Tomaž, 1960-Online user-generated content such as posts on social media, blogs, and forums, is becoming an increasingly important source of information, as shown by numerous rapidly growing NLP fields such as ... sentiment analysis and data mining. However, user-generated content is well-known to contain a significant degree of noise, e.g. abbreviations, missing spaces, as well as non-standard spelling, lexis, and use of punctuation. All this hinders the effectiveness of NLP tools when processing such data, and to overcome this obstacle, data normalisation is required. In this paper, we present a training set that will be used to improve the tokenisation, normalisation, and sentence segmentation of Slovene tweets. We describe some of the most Twitter-specific aspects of our annotation guidelines as well as the workflow of our annotation campaign, the goal of which was to create a manually annotated gold-standard dataset of 4,000 tweets extracted from the JANES corpus of Internet Slovene.Source: Normalisation and analysis of social media texts (NormSoMe) [Elektronski vir] : [workshop proceedings] (Str. 5-10)Type of material - conference contributionPublish date - 2016Language - englishCOBISS.SI-ID - 60917346
Author
Čibej, Jaka, prevodoslovje, računalništvo |
Fišer, Darja, 1978- |
Erjavec, Tomaž, 1960-
Topics
računalniško posredovana komunikacija |
nestandardni jezik |
tokenizacija |
normalizacija |
stavčna segmentacija |
slovenščina |
computer-mediated communication |
non-standard language |
normalisation |
tokenisation |
sentence segmentation |
tweets |
Slovene
![loading ... loading ...](themes/default/img/ajax-loading.gif)
Shelf entry
Permalink
- URL:
Impact factor
Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.
Year | Impact factor | Edition | Category | Classification | ||||
---|---|---|---|---|---|---|---|---|
JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP |
Select the library membership card:
DRS, in which the journal is indexed
Database name | Field | Year |
---|
Links to authors' personal bibliographies | Links to information on researchers in the SICRIS system |
---|---|
Čibej, Jaka, prevodoslovje, računalništvo | 36914 |
Fišer, Darja, 1978- | 26294 |
Erjavec, Tomaž, 1960- | 05023 |
Select pickup location:
Material pickup by post
Notification
Subject headings in COBISS General List of Subject Headings
Select pickup location
Pickup location | Material status | Reservation |
---|
Please wait a moment.