-
Razvoj učne množice za izboljšano označevanje spletnih besedil [Elektronski vir]Čibej, Jaka, prevodoslovje, računalništvo ...In many ways, the language of internet communication differs from standard language. Internet texts are more difficult to process for existing tools, which have predominantly been trained on standard ... language. To improve their accuracy when processing internet texts, the tools and annotation methodologies need to be upgraded. In this paper, we present the compilation of a training corpus of Slovene internet communication to be used as a training set to improve the automatic annotation of Slovene internet texts. The training corpus was sampled from the JANES corpus of Internet Slovene, then automatically annotated and manually corrected on five annotation levels: tokenisation, sentence segmentation, normalisation, lemmatisation, and morphosyntax.Type of material - conference contributionPublish date - 2016Language - slovenianCOBISS.SI-ID - 62529890
Author
Čibej, Jaka, prevodoslovje, računalništvo |
Arhar Holdt, Špela |
Erjavec, Tomaž, 1960- |
Fišer, Darja, 1978-
Topics
računalniško jezikoslovje |
spletna komunikacija |
slovenščina |
tviti |
označevanje |
tokenizacija |
lematizacija |
stavčna segmentacija |
normalizacija |
oblikoskladenjske oznake |
computational linguistics |
Slovene language |
web communication |
tweets |
anotation |
tokenization |
lemmatization |
sentence segmentation |
normalization |
morphosyntactic annotations
![loading ... loading ...](themes/default/img/ajax-loading.gif)
Shelf entry
Permalink
- URL:
Impact factor
Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.
Year | Impact factor | Edition | Category | Classification | ||||
---|---|---|---|---|---|---|---|---|
JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP |
Select the library membership card:
DRS, in which the journal is indexed
Database name | Field | Year |
---|
Links to authors' personal bibliographies | Links to information on researchers in the SICRIS system |
---|---|
Čibej, Jaka, prevodoslovje, računalništvo | 36914 |
Arhar Holdt, Špela | 27674 |
Erjavec, Tomaž, 1960- | 05023 |
Fišer, Darja, 1978- | 26294 |
Select pickup location:
Material pickup by post
Notification
Subject headings in COBISS General List of Subject Headings
Select pickup location
Pickup location | Material status | Reservation |
---|
Please wait a moment.