Research on the translation of Lampung language text dialect of Nyo into Indonesian is done with two approaches, namely Direct Machine Translation (DMT) and Statistical Machine Translation (SMT). ...This research experiment was conducted as a preliminary effort in helping students immigrants in the province of Lampung, translating the Lampung language dialect of Nyo through prototypes or models was built. In the DMT approach, the dictionary is used as the primary tool. In contrast, in SMT, the parallel corpus of Lampung Nyo and Indonesian language is used to make language models and translation models using Moses Decoder. The result of text translation accuracy with the DMT approach is 39.32%, and for the SMT approach is 59.85%. Both approaches use Bilingual Evaluation Understudy (BLEU) assessment.
The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each ...language. Machine translation offers a solution, as it provides translation automatically. Until recently, statistical machine translation has proved to be one of the most successful approaches. However, a new approach to machine translation based on neural networks has emerged with promising results. The present paper concerns phrase-based statistical machine translation, an area that has been extensively studied in the literature. The translation system consists of many components built on the premise of probabilities. Each component is described separately. Although high quality translation systems have been developed for certain language pairs, there is still a large number of languages that cause many translation errors. Languages with a rich morphology pose an especially difficult challenge for research. We address one group of morphologically rich languages: Slavic languages, which constitute a relatively homogeneous family of languages characterized by rich, inflectional morphology. The present paper offers a comprehensive survey of approaches to coping with Slavic languages in different aspects of statistical machine translation. We observe that the interest of the community in research of more difficult languages is increasing and we believe that the translation quality of those languages will reach the level of practical use in the near future.
Machine translation (MT) from English to foreign languages is a fast developing area of research, and various techniques of translation are discussed in the literature. However, translation from ...English to Malayalam, a Dravidian language, is still in the rising stage, and works in this field have not flourished to a great extent, so far. The main reason of this shortcoming is the non-availability of linguistic resources and translation tools in the Malayalam language. A parallel corpus with alignment is one of such resources that are essential for a machine translator system. This paper focuses on a technique that enables automatic setting up of a verb-aligned parallel corpus by exploring the internal structure of the English and Malayalam language, which in turn facilitates the task of machine translation from English to Malayalam.
One of the important applications for which natural language processing (NLP) is used is the machine translation (MT) system, which automatically converts one natural language to another. It has ...witnessed various paradigm shifts since its inception. Statistical machine translation (SMT) has dominated MT research for decades. In the recent past, researchers have focused on developing MT systems based on artificial neural networks (ANN). In this paper, first, some important deep learning models that are mostly exploited in Neural Machine Translation (NMT) design are discussed. A systematic comparison was done between the performances of SMT and NMT concerning the English-to-Bangla and English-to-Hindi translation tasks. Most of the Indian scripts are morphologically rich, and the availability of a sufficient corpus is rare. We have presented and analyzed our work and a survey was conducted on other low-resource languages, and finally some useful conclusions have been drawn.
Term translation is of great importance for machine translation. In this article, we investigate three issues of term translation in the context of statistical machine translation and propose three ...corresponding models: (a) a term translation disambiguation model which selects desirable translations for terms in the source language with domain information, (b) a term translation consistency model that encourages consistent translations for terms with a high strength of translation consistency throughout a document, and (c) a term unithood model that rewards translation hypotheses where source terms are translated into target strings as a whole unit. We integrate the three models into hierarchical phrase-based SMT and evaluate their effectiveness on NIST Chinese–English translation with large-scale training data. Experiment results show that all three models can achieve substantial improvements over the baseline. Our analyses also suggest that the proposed models are capable of improving term translation.
Neural machine translation (NMT) has gained more and more attention in recent years, mainly due to its simplicity yet state-of-the-art performance. However, previous research has shown that NMT ...suffers from several limitations: source coverage guidance, translation of rare words, and the limited vocabulary, while statistical machine translation (SMT) has complementary properties that correspond well to these limitations. It is straightforward to improve the translation performance by combining the advantages of two kinds of models. This paper proposes a general framework for incorporating the SMT word knowledge into NMT to alleviate above word-level limitations. In our framework, the NMT decoder makes more accurate word prediction by referring to the SMT word recommendations in both training and testing phases. Specifically, the SMT model offers informative word recommendations based on the NMT decoding information. Then, we use the SMT word predictions as prior knowledge to adjust the NMT word generation probability, which unitizes a neural network based classifier to digest the discrete word knowledge. In this paper, we use two model variants to implement the framework, one with a gating mechanism and the other with a direct competition mechanism. Experimental results on Chinese-to-English and English-to-German translation tasks show that the proposed framework can take advantage of the SMT word knowledge and consistently achieve significant improvements over NMT and SMT baseline systems.
•Study of comparable corpora(CC) for extracting parallel information.•Generative model for extracting parallel fragments of CC without need of initial seed.•Enhancing a SMT system using extracted ...parallel fragments from CC.
Although parallel corpora are essential language resources for many natural language processing tasks, they are rare or even not available for many language pairs. Instead, comparable corpora are widely available and contain parallel fragments of information that can be used in applications like statistical machine translation systems. In this research, we propose a generative latent Dirichlet allocation based model for extracting parallel fragments from comparable documents without using any initial parallel data or bilingual lexicon. The experimental results show significant improvement if the extracted fragments generated by the proposed method are used for augmenting an existing parallel corpus in an statistical machine translation system. According to the human judgment, the accuracy of the proposed method for an English-Persian task is about 59.7%. Also, the out of vocabulary error rate for the same task is reduced by 28%.
Transforming text from one language to another by using computer systems automatically or with little human interventions is known as Machine Translation System (MTS). Divergence among natural ...languages in a multilingual environment makes Machine Translation (MT) a difficult and challenging task. The purpose of this paper is to present a comprehensive survey of MTS in general and for English, Hindi and Sanskrit languages in particular. The state-of-the-art MT approach is Neural Machine Translation (NMT) which has been used by Google, Amazon, Facebook and Microsoft but it requires large corpus as well as high computing systems. The availability of MT language modeling tools, parsers data repositories and evaluation metrics has been tabulated in this article. The classification of MTS, evaluation methods and platforms has been done based on a well-defined set of criteria. The new research avenues have been explored in this survey article which will help in developing good quality MTS. Although several surveys have been done on MTS but none of them have followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach including tools and evaluation methods as done in this survey specifically for English, Hindi and Sanskrit languages.