Statistical machine translation of subtitles for highly inflected language pair

VSE knjižnice (vzajemna bibliografsko-kataložna baza podatkov COBIB.SI)

Statistical machine translation of subtitles for highly inflected language pair

Sepesy Maučec, Mirjam ; Kačič, Zdravko ; Verdonik, Darinka

This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation ... encounters problems if the parallel corpus is not big enough. To reduce the problem of data sparsity, we use the approach called factored translation, which has proven successful when translating between English and a morphologically rich language. We show that it is even more useful when translating between two highly inflected languages. The main contribution of the paper involves two extensions of the factored translation approach. First, we propose a new, more general asynchronous framework for training translation components, where lemmas in the lemma component and MSD tags in the MSD component are aligned independently of alignment done for surface word forms. The second contribution of the paper is a new technique for efficient use of a bilingual dictionary in the translation process. A dictionary is introduced into the lemma component to improve lexical translation. Dictionary use is based on entropy. We tested our enhanced translation approach on the Slovenian-Serbian language pair. The system was trained on a freely available OpenSubtitle corpus. The results show improvements in automatic scores (BLEU and TER). The approach could be used for other language pairs, especially if one or both are highly inflected.

Vir: Pattern recognition letters : an official publication of the International Association for Pattern Recognition. - ISSN 0167-8655 (Vol. 46, 1 Sep. 2014, str. 96-103)

Vrsta gradiva - članek, sestavni del

Leto - 2014

Jezik - angleški

COBISS.SI-ID - 17900054

DOI

Išči dalje

Avtor
Sepesy Maučec, Mirjam | Kačič, Zdravko | Verdonik, Darinka

Teme
statistical machine translation | phrase-based translation | highly inflected languages | bilingual dictionary | entropy

Zaloga po knjižnicah

vir: Pattern recognition letters : an official publication of the International Association for Pattern Recognition. - ISSN 0167-8655 (Vol. 46, 1 Sep. 2014, str. 96-103)

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS
Sepesy Maučec, Mirjam	18168
Kačič, Zdravko	06821
Verdonik, Darinka	23838

Vir: Osebne bibliografije in: SICRIS

Gradivo iz matične enote je brezplačno. Če je gradivo na mesto prevzema dostavljeno iz drugih enot, lahko knjižnica to storitev zaračuna.

Mesto prevzema	Status gradiva	Rezervacija

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Izberite prevzemno mesto:

Prevzem gradiva po pošti

Obvestilo

Citiranje

Gesla v Splošnem geslovniku COBISS

Izbira mesta prevzema

Rezervacija je uspela.

Rezervacija ni uspela.

Rezervacija...

Bibliografski podatki

Število izposoj

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Izposoja uspešna

Izposoja ni uspela

Tema