DIKUL - logo

Rezultati iskanja

Osnovno iskanje    Ukazno iskanje   

Trenutno NISTE avtorizirani za dostop do e-virov UL. Za polni dostop se PRIJAVITE.

1
zadetkov: 8
1.
  • Automatic Genre Identificat... Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models
    Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola Machine learning and knowledge extraction, 09/2023, Letnik: 5, Številka: 3
    Journal Article
    Recenzirano
    Odprti dostop

    Massive text collections are the backbone of large language models, the main ingredient of the current significant progress in artificial intelligence. However, as these collections are mostly ...
Celotno besedilo
Dostopno za: UL
2.
  • Spremljevalni korpus Trendi... Spremljevalni korpus Trendi in avtomatska kategorizacija
    Kosem, Iztok; Čibej, Jaka; Dobrovoljc, Kaja ... Slovenscina 2.0, 09/2023, Letnik: 11, Številka: 1
    Journal Article
    Recenzirano
    Odprti dostop

    Prispevek predstavlja izdelavo korpusa Trendi, prvega spremljevalnega korpusa za slovenščino. Trenutna različica Trendi 2023-02 pokriva besedila od januarja 2019 do konca februarja 2023, vsebuje pa ...
Celotno besedilo
Dostopno za: UL
3.
  • Automatic genre identificat... Automatic genre identification: a survey
    Kuzman, Taja; Ljubešić, Nikola Language resources and evaluation, 11/2023
    Journal Article
    Recenzirano
    Odprti dostop

    Abstract Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s ...
Celotno besedilo
Dostopno za: UL
4.
  • CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
    Ljubešić, Nikola; Kuzman, Taja arXiv.org, 03/2024
    Paper, Journal Article
    Odprti dostop

    This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official ...
Celotno besedilo
Dostopno za: UL
5.
  • Structural and Semantic Cla... Structural and Semantic Classification of Verbal Multi-Word Expressions in Slovene
    Gantar, Polona; Arhar Holdt, Špela; Čibej, Jaka ... Prispevki za novejšo zgodovino, 2019, Letnik: 59, Številka: 1
    Journal Article
    Odprti dostop

    This paper is an extended version of a conference paper presenting the categorization of verbal multi-word expressions (VMWEs) according to the PARSEME COST Action Shared Task 1.1 Guidelines. The ...
Celotno besedilo
Dostopno za: UL

PDF
6.
  • ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
    Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola arXiv.org, 03/2023
    Paper, Journal Article
    Odprti dostop

    ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used ...
Celotno besedilo
Dostopno za: UL
7.
  • Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
    Ljubešić, Nikola; Suchomel, Vít; Rupnik, Peter ... arXiv.org, 04/2024
    Paper, Journal Article
    Odprti dostop

    The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, ...
Celotno besedilo
Dostopno za: UL
8.
  • Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
    Rik van Noord; Kuzman, Taja; Rupnik, Peter ... arXiv.org, 03/2024
    Paper, Journal Article
    Odprti dostop

    Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA ...
Celotno besedilo
Dostopno za: UL
1
zadetkov: 8

Nalaganje filtrov