Rezultati iskanja Kuzman, Taja UL

1.	Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola Machine learning and knowledge extraction, 09/2023, Letnik: 5, Številka: 3 Journal Article Recenzirano Odprti dostop Massive text collections are the backbone of large language models, the main ingredient of the current significant progress in artificial intelligence. However, as these collections are mostly ...	Celotno besedilo Dostopno za: UL
2.	Spremljevalni korpus Trendi in avtomatska kategorizacija Kosem, Iztok; Čibej, Jaka; Dobrovoljc, Kaja ... Slovenscina 2.0, 09/2023, Letnik: 11, Številka: 1 Journal Article Recenzirano Odprti dostop Prispevek predstavlja izdelavo korpusa Trendi, prvega spremljevalnega korpusa za slovenščino. Trenutna različica Trendi 2023-02 pokriva besedila od januarja 2019 do konca februarja 2023, vsebuje pa ...	Celotno besedilo Dostopno za: UL
3.	Automatic genre identification: a survey Kuzman, Taja; Ljubešić, Nikola Language resources and evaluation, 11/2023 Journal Article Recenzirano Odprti dostop Abstract Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s ...	Celotno besedilo Dostopno za: UL
4.	CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation Ljubešić, Nikola; Kuzman, Taja arXiv.org, 03/2024 Paper, Journal Article Odprti dostop This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official ...	Celotno besedilo Dostopno za: UL
5.	Structural and Semantic Classification of Verbal Multi-Word Expressions in Slovene Gantar, Polona; Arhar Holdt, Špela; Čibej, Jaka ... Prispevki za novejšo zgodovino, 2019, Letnik: 59, Številka: 1 Journal Article Odprti dostop This paper is an extended version of a conference paper presenting the categorization of verbal multi-word expressions (VMWEs) according to the PARSEME COST Action Shared Task 1.1 Guidelines. The ...	Celotno besedilo Dostopno za: UL PDF
6.	ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola arXiv.org, 03/2023 Paper, Journal Article Odprti dostop ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used ...	Celotno besedilo Dostopno za: UL
7.	Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining Ljubešić, Nikola; Suchomel, Vít; Rupnik, Peter ... arXiv.org, 04/2024 Paper, Journal Article Odprti dostop The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, ...	Celotno besedilo Dostopno za: UL
8.	Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages Rik van Noord; Kuzman, Taja; Rupnik, Peter ... arXiv.org, 03/2024 Paper, Journal Article Odprti dostop Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA ...	Celotno besedilo Dostopno za: UL

1.

Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models
Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola Machine learning and knowledge extraction, 09/2023, Letnik: 5, Številka: 3
Journal Article

Recenzirano

Odprti dostop

Massive text collections are the backbone of large language models, the main ingredient of the current significant progress in artificial intelligence. However, as these collections are mostly ...

Celotno besedilo

Dostopno za: UL

2.

Spremljevalni korpus Trendi in avtomatska kategorizacija
Kosem, Iztok; Čibej, Jaka; Dobrovoljc, Kaja ... Slovenscina 2.0, 09/2023, Letnik: 11, Številka: 1
Journal Article

Recenzirano

Odprti dostop

Prispevek predstavlja izdelavo korpusa Trendi, prvega spremljevalnega korpusa za slovenščino. Trenutna različica Trendi 2023-02 pokriva besedila od januarja 2019 do konca februarja 2023, vsebuje pa ...

Celotno besedilo

Dostopno za: UL

3.

Automatic genre identification: a survey
Kuzman, Taja; Ljubešić, Nikola Language resources and evaluation, 11/2023
Journal Article

Recenzirano

Odprti dostop

Abstract Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s ...

Celotno besedilo

Dostopno za: UL

4.

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Ljubešić, Nikola; Kuzman, Taja arXiv.org, 03/2024
Paper, Journal Article

Odprti dostop

This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official ...

Celotno besedilo

Dostopno za: UL

5.

Structural and Semantic Classification of Verbal Multi-Word Expressions in Slovene
Gantar, Polona; Arhar Holdt, Špela; Čibej, Jaka ... Prispevki za novejšo zgodovino, 2019, Letnik: 59, Številka: 1
Journal Article

Odprti dostop

This paper is an extended version of a conference paper presenting the categorization of verbal multi-word expressions (VMWEs) according to the PARSEME COST Action Shared Task 1.1 Guidelines. The ...

Celotno besedilo

Dostopno za: UL

PDF

6.

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Kuzman, Taja; Mozetič, Igor; Ljubešić, Nikola arXiv.org, 03/2023
Paper, Journal Article

Odprti dostop

ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used ...

Celotno besedilo

Dostopno za: UL

7.

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
Ljubešić, Nikola; Suchomel, Vít; Rupnik, Peter ... arXiv.org, 04/2024
Paper, Journal Article

Odprti dostop

The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, ...

Celotno besedilo

Dostopno za: UL

8.

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Rik van Noord; Kuzman, Taja; Rupnik, Peter ... arXiv.org, 03/2024
Paper, Journal Article

Odprti dostop

Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA ...

Celotno besedilo

Dostopno za: UL

Naloži sliko

Rezultati iskanja

Nalaganje filtrov

Noben zadetek ni izbran!

Iskanje je bilo uspešno shranjeno.

Urejanje

Iskanja ni bilo mogoče shraniti.

Shrani iskanje

Vnos na polico

Noben zadetek ni izbran!

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je le deloma uspelo.

Dodajanje gradiva na polico je v celoti spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Duplikat

Dosežena omejitev

Urejanje

Napaka

Urejanje

Dodajanje

Urejanje

Sprememba statusa

Tema