VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)
-
LLM teacher-student framework for text classification with no manually annotated data [Elektronski vir] : a case study in IPTC news topic classificationKuzman Pungeršek, Taja ; Ljubešić, Nikola, 1979-With the ever-increasing number of news stories available online, classifying them by topic, regardless of the language they are written in, has become crucial for enhancing readers’ access to ... relevant content. To address this challenge, we propose a teacher-student framework based on large language models (LLMs) for developing multilingual news topic classification models of reasonable size with no need for manual data annotation. The framework employs a Generative Pretrained Transformer (GPT) model as the teacher model to develop a news topic training dataset through automatic annotation of 20,000 news articles in Slovenian, Croatian, Greek, and Catalan. Articles are classified into 17 main categories from the Media Topic schema, developed by the International Press Telecommunications Council (IPTC). The teacher model exhibits high zero-shot performance in all four languages. Its agreement with human annotators is comparable to that between the human annotators themselves. To mitigate the computational limitations associated with the requirement of processing millions of texts daily, smaller BERT-like student models are fine-tuned on the GPT-annotated dataset. These student models achieve high performance comparable to the teacher model. Furthermore, we explore the impact of the training data size on the performance of the student models and investigate their monolingual, multilingual, and zero-shot cross-lingual capabilities. The findings indicate that student models can achieve high performance with a relatively small number of training instances, and demonstrate strong zero-shot cross-lingual abilities. Finally, we publish the best-performing news topic classifier, enabling multilingual classification with the top-level categories of the IPTC Media Topic schema.Vir: IEEE access [Elektronski vir]. - ISSN 2169-3536 (Vol. 13, 24 Feb. 2025, str. 35621-35633)Vrsta gradiva - e-članek ; neleposlovje za odrasleLeto - 2025Jezik - angleškiCOBISS.SI-ID - 229074179
Avtor
Kuzman Pungeršek, Taja |
Ljubešić, Nikola, 1979-
Teme
Strojno učenje |
veliki jezikovni modeli |
anotacija podatkov |
avtomatsko razvrščanje |
tematika novic |
large language models |
multilingual text classification |
International Press Telecommunications Council |
news topic |
topic classification |
training data preparation |
data annotation

Vnos na polico
Trajna povezava
- URL:
Faktor vpliva
Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.
Leto | Faktor vpliva | Izdaja | Kategorija | Razvrstitev | ||||
---|---|---|---|---|---|---|---|---|
JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP |
Faktor vpliva
Baze podatkov, v katerih je revija indeksirana
Ime baze podatkov | Področje | Leto |
---|
Povezave do osebnih bibliografij avtorjev | Povezave do podatkov o raziskovalcih v sistemu SICRIS |
---|---|
Kuzman Pungeršek, Taja | 55962 |
Ljubešić, Nikola, 1979- | 36871 |
Vir: Osebne bibliografije
in: SICRIS
Izberite prevzemno mesto:
Prevzem gradiva po pošti
Naslov za dostavo:
Med podatki člana manjka naslov.
Storitev za pridobivanje naslova trenutno ni dostopna, prosimo, poskusite še enkrat.
S klikom na gumb "V redu" boste potrdili zgoraj izbrano prevzemno mesto in dokončali postopek rezervacije.
S klikom na gumb "V redu" boste potrdili zgoraj izbrano prevzemno mesto in naslov za dostavo ter dokončali postopek rezervacije.
S klikom na gumb "V redu" boste potrdili zgoraj izbrani naslov za dostavo in dokončali postopek rezervacije.
Obvestilo
Trenutno je storitev za avtomatsko prijavo in rezervacijo nedostopna. Gradivo lahko rezervirate sami na portalu Biblos ali ponovno poskusite tukaj kasneje.
Izbira mesta prevzema
Gradivo iz matične enote je brezplačno. Če je gradivo na mesto prevzema dostavljeno iz drugih enot, lahko knjižnica to storitev zaračuna.
Mesto prevzema | Status gradiva | Rezervacija |
---|
Rezervacija v teku
Prosimo, počakajte trenutek.
Rezervacija je uspela.
Rezervacija ni uspela.
Rezervacija...
Članska izkaznica:
Mesto prevzema: