-
Predicting the level of text standardness in user-generated content [Elektronski vir]Ljubešić, Nikola, 1979- ...Non-standard language as it appears in user-generated content has recently at- tracted much attention. This paper pro- poses that non-standardness comes in two basic varieties, technical and ... linguistic, and develops a machine-learning method to discriminate between standard and non- standard texts in these two dimensions. We describe the manual annotation of a dataset of Slovene user-generated content and the features used to build our re- gression models. We evaluate and dis- cuss the results, where the mean abso- lute error of the best performing method on a three-point scale is 0.38 for tech- nical and 0.42 for linguistic standard- ness prediction. Even when using no language-dependent information sources, our predictor still outperforms an OOV- ratio baseline by a wide margin. In addi- tion, we show that very little manually an- notated training data is required to perform good prediction. Predicting standardness can help decide when to attempt to nor- malise the data to achieve better annota- tion results with standard tools, and pro- vide linguists who are interested in non- standard language with a simple way of selecting only such texts for their research.Vir: Proceedings [Elektronski vir] (Str. 371-378)Vrsta gradiva - prispevek na konferenci ; neleposlovje za odrasleLeto - 2015Jezik - angleškiCOBISS.SI-ID - 58338402
Avtor
Ljubešić, Nikola, 1979- |
Fišer, Darja, 1978- |
Erjavec, Tomaž, 1960- |
Čibej, Jaka, prevodoslovje, računalništvo |
Marko, Dafne |
Pollak, Senja, 1980- |
Škrjanec, Iza
Teme
nestandardni jezik |
spletne uporabniške vsebine |
korpusi |
avtomatska mera jezikovne standardnosti |
nadzorovano strojno učenje |
non-standard lagnuage |
user-generated content |
corpora |
automatic language standardness |
measure supervised machine learning
Vnos na polico
Trajna povezava
- URL:
Faktor vpliva
Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.
Leto | Faktor vpliva | Izdaja | Kategorija | Razvrstitev | ||||
---|---|---|---|---|---|---|---|---|
JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP |
Baze podatkov, v katerih je revija indeksirana
Ime baze podatkov | Področje | Leto |
---|
Povezave do osebnih bibliografij avtorjev | Povezave do podatkov o raziskovalcih v sistemu SICRIS |
---|---|
Ljubešić, Nikola, 1979- | 36871 |
Fišer, Darja, 1978- | 26294 |
Erjavec, Tomaž, 1960- | 05023 |
Čibej, Jaka, prevodoslovje, računalništvo | 36914 |
Marko, Dafne | |
Pollak, Senja, 1980- | 31844 |
Škrjanec, Iza |
Izberite prevzemno mesto:
Prevzem gradiva po pošti
Obvestilo
Gesla v Splošnem geslovniku COBISS
Izbira mesta prevzema
Mesto prevzema | Status gradiva | Rezervacija |
---|
Prosimo, počakajte trenutek.