NUK - logo

Rezultati iskanja

Osnovno iskanje    Ukazno iskanje   

Trenutno NISTE avtorizirani za dostop do e-virov NUK. Za polni dostop se PRIJAVITE.

1
zadetkov: 8
1.
  • Spremljevalni korpus Trendi... Spremljevalni korpus Trendi in avtomatska kategorizacija
    Kosem, Iztok; Čibej, Jaka; Dobrovoljc, Kaja ... Slovenscina 2.0, 09/2023, Letnik: 11, Številka: 1
    Journal Article
    Recenzirano
    Odprti dostop

    Prispevek predstavlja izdelavo korpusa Trendi, prvega spremljevalnega korpusa za slovenščino. Trenutna različica Trendi 2023-02 pokriva besedila od januarja 2019 do konca februarja 2023, vsebuje pa ...
Celotno besedilo
2.
  • Automatic genre identificat... Automatic genre identification: a survey
    Kuzman, Taja; Ljubešić, Nikola Language resources and evaluation, 11/2023
    Journal Article
    Recenzirano
    Odprti dostop

    Abstract Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s ...
Celotno besedilo
3.
  • CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
    Ljubešić, Nikola; Kuzman, Taja arXiv.org, 03/2024
    Paper, Journal Article
    Odprti dostop

    This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official ...
Celotno besedilo
4.
Celotno besedilo

PDF
5.
  • ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
    Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola arXiv.org, 03/2023
    Paper, Journal Article
    Odprti dostop

    ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used ...
Celotno besedilo
6.
  • The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
    Kuzman, Taja; Rupnik, Peter; Ljubešić, Nikola arXiv (Cornell University), 01/2022
    Paper, Journal Article
    Odprti dostop

    This paper presents a new training dataset for automatic genre identification GINCO, which is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words. Each document was ...
Celotno besedilo
7.
  • Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
    Ljubešić, Nikola; Suchomel, Vít; Rupnik, Peter ... arXiv.org, 04/2024
    Paper, Journal Article
    Odprti dostop

    The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, ...
Celotno besedilo
8.
  • Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
    Rik van Noord; Kuzman, Taja; Rupnik, Peter ... arXiv.org, 03/2024
    Paper, Journal Article
    Odprti dostop

    Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA ...
Celotno besedilo
1
zadetkov: 8

Nalaganje filtrov