Display omitted
Decoding is an important part of machine translation systems, and the most popular inference algorithm used here is beam search. Beam search algorithm improves translation by allowing ...a larger search space to be traversed than greedy search. However, as the beam width increases, the translation performance declines after a certain point in neural machine translation (NMT). This problem is usually not observed in statistical machine translation (SMT) due to the decoding method. This paper proposes a hybrid system-based method that uses SMT predictions to prevent quality deterioration in the beam search algorithm used in NMT decoding. Our approach is based on the reranking n-best list of NMT according to the SMT system translation sentence. We propose two different algorithms for reranking NMT n-best lists. The first algorithm uses the length information of the SMT outputs. In contrast, the second uses a word-based similarity approach with the Jaccard Index, the Dice’s Coefficient, and the Overlap Coefficient. Experiments on three different language pairs show that the method we propose prevents the decrease in translation quality and produces a gain of 1.3 BLEU and 1.6 METEOR for different beam sizes and 1.8 BLEU and 2.1 METEOR average scores compared to the baseline results.
This letter, presents the compendium of eight unsupervised Machine Translation (MT) systems built from monolingual corpus of five Indian languages from the Indo-Aryan and Dravidian language families. ...Recent research has demonstrated outstanding results in completely unsupervised training of Phrase-based Statistical MT (PBSMT) systems using innovative and designs that rely solely on monolingual datasets. However, prior research has shown that Unsupervised Statistical MT (USMT) outperforms Unsupervised Neural MT (UNMT), particularly for language pairings that are not closely related. The purpose of this work is to investigate the architecture of the USMT system utilizing only monolingual dataset using four different Indian morphologically rich languages and one low-resource endangered Kangri language. The experimental results analysis are evaluated using different natural language toolkit tokenizers and analyzed for different language pair using various fully automatic MT evaluation metrics for different iterations.
In the domain of machine translation (MT) processing, end-to-end neural machine translation (NMT) has emerged as a remarkable breakthrough, surpassing the conventional statistical MT approaches. ...Inspired by the Internet of Things (IoT) technology, some researchers are exploring how to integrate device-to-device communication patterns into NMT to enhance translation efficiency. However, the current state-of-the-art NMT models predominantly adopt sequence-based representations for both the source language and target language sentences. The lack of natural language sentence structure attributes leads to problems such as unfaithful translation in NMT. To enhance lexical alignment in NMT, the paper proposes a new transformer MT model that incorporates vocabulary alignment structure. The model receives external lexical alignment information during each step of the decoding process in the decoder design to alleviate the problem of missing lexical alignment structures. During the decoding phase of the model, the statistical MT system plays a crucial role by supplying relevant lexical alignment information derived from the decoding information obtained from the NMT. Additionally, the model suggests vocabulary recommendations based on this lexical alignment information. The experimental results provide evidence that this approach successfully integrates the vocabulary knowledge derived from statistical MT, leading to improved translation performance.
Machine translation and its evaluation: a study Mondal, Subrota Kumar; Zhang, Haoxi; Kabir, H. M. Dipu ...
The Artificial intelligence review,
09/2023, Volume:
56, Issue:
9
Journal Article
Peer reviewed
Machine translation (namely MT) has been one of the most popular fields in computational linguistics and Artificial Intelligence (AI). As one of the most promising approaches, MT can potentially ...break the language barrier of people from all over the world. Despite a number of studies in MT, there are few studies in summarizing and comparing MT methods. To this end, in this paper, we principally focus on presenting the two mainstream MT schemes: statistical machine translation (SMT) and neural machine translation (NMT), including their basic rationales and developments. Meanwhile, the detailed translation models are also presented, such as the word-based model, syntax-based model, and phrase-based model in statistical machine translation. Similarly, approaches in NMT, such as the recurrent neural network-based, attention mechanism-based, and transformer-based models are presented. Last but not least, the evaluation approaches also play an important role in helping developers to improve their methods better in MT. The prevailing machine translation evaluation methodologies are also presented in this article.
This study focuses on applying intelligent translation systems in assisting high-quality English writing in the Internet era. The study analyzes statistical machine translation techniques, especially ...N-gram language modeling and word alignment techniques, and their crucial role in translation quality improvement. In English writing, the intelligent translation system significantly improves the quality of students’ Writing through word block translation. By analyzing 320 students, the mean value of self-efficacy in writing skills of students in the high-quality writing group was 3.78, significantly higher than that of the low-quality group, which was 2.82. After the experiment, 0.57 students indicated that they would improve their English writing vocabulary with the aid of the Intelligent Translation System, which showed the potential of the Intelligent Translation System to enhance students’ interest in and autonomy in Writing. Average distribution analysis shows that word block usage positively correlates with writing performance, with an R Square value of 0.6726. The intelligent translation system improves students’ English writing and enhances their self-efficacy, which is of great significance to English teaching.
Retelling extraction is an important branch of Natural Language Processing (NLP), and high-quality retelling resources are very helpful to improve the performance of machine translation. However, ...traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application. In order to solve this problem, we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance. In this method, Probabilistic Latent Semantic Analysis (PLSA) is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition. Then we design a decoder to simplify the decoding process. Experiments show that the proposed method can effectively improve the accuracy of translation.
The language barrier is one of the practical challenges human being face during communication. To overcome this, researchers are focusing on using machines to translate a source language to a target ...language using the textual representations of the languages. Thus, machine translation (MT) could achieve a near human-level performance in terms of translation quality for several resource-rich languages. However, machine translation performance is still far from a production-level quality for the low resource languages. This work reports a semi-supervised neural machine translation system to boost the translation quality for an extremely resource constraint language pair, i.e. English–Manipuri. Our proposed approach exploits self-training and back-translation in a combined technique. The quantitative evaluation shows that the system performance improves by +0.9 BLEU score after introducing external noise to the input data. Additionally, a multi-reference test dataset developed in-house is used to evaluate the linguistic diversity of the highly agglutinative and morphologically rich Manipuri language. Experimental result attests that the proposed semi-supervised system outperforms the supervised, the pretrained mBART and existing semi-supervised baselines in terms of automatic score and subjective evaluation parameters by a significant margin up to +4.5 and +1.2 BLEU improvements against the supervised and mBART baselines respectively.
•Backtranslation and forward-translation improve the low resource machine translation.•External perturbations to the noisy synthetic data help in converging the model.•Linguistic variations are tackled via the inclusion of multiple test references.•The proposed method is competitive with pre-trained models.
Lampung Province is located on the island of Sumatera. For the immigrants in Lampung, they have difficulty in communicating with the indigenous people of Lampung. As an alternative, both immigrants ...and the indigenous people of Lampung speak Indonesian. This research aims to build a language model from Indonesian language and a translation model from the Lampung language dialect of nyo, both models will be combined in a Moses decoder. This research focuses on observing the effect of adding mono corpus to the experimental statistical machine translation of Indonesian - Lampung dialect of nyo. This research uses 3000 pair parallel corpus in Indonesia language and Lampung language dialect of nyo as source language and uses 3000 mono corpus sentences in Lampung language dialect of nyo as target language. The results showed that the accuracy value in bilingual evalution under-study score when using 1000 sentences, 2000 sentences, 3000 sentences mono corpus show the accuracy value of the bilingual evaluation under-study, respectively, namely 40.97 %, 41.80 % and 45.26 %.