As artificial intelligence (AI) translation technology advances, big data, cloud computing, and emerging technologies have enhanced the progress of the data industry over the past several decades. ...Human-machine translation becomes a new interactive mode between humans and machines and plays an essential role in transmitting information. Nevertheless, several translation models have their drawbacks and limitations, such as error rates and inaccuracy, and they are not able to adapt to the various demands of different groups. Taking the AI-based translation model as the research object, this study conducted an analysis of attention mechanisms and relevant technical means, examined the setbacks of conventional translation models, and proposed an AI-based translation model that produced a clear and high quality translation and presented a reference to further perfect AI-based translation models. The values of the manual and automated evaluation have demonstrated that the human-machine translation model improved the mismatchings between texts and contexts and enhanced the accurate and efficient intelligent recognition and expressions. It is set to a score of 1-10 for evaluation comparison with 30 language users as participants, and the achieved 6 points or above is considered effective. The research results suggested that the language fluency score rose from 4.9667 for conventional Statistical Machine Translation to 6.6333 for the AI-based translation model. As a result, the human-machine translation model improved the efficiency, speed, precision, and accuracy of language input to a certain degree, strengthened the correlation between semantic characteristics and intelligent recognition, and pushed the advancement of intelligent recognition. It can provide accurate and high-quality translation for language users and achieve an understanding of natural language input and output and automatic processing.
This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established ...multidimensional quality metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) reduces the errors produced by the worst system (pure phrase-based) by more than half (54%). Moreover, we conducted an additional analysis of agreement errors in which we distinguished between short (phrase-level) and long distance (sentence-level) errors. We discovered that phrase-based MT approaches are of limited use for long distance agreement phenomena, for which neural MT was found to be especially effective.
This paper is an attempt to focus on investigating the pivot (bridge) language technique, where the pivot language used to improve Statistical Machine Translation (SMT) quality. In this case, ...Indonesian is used as a pivot language, where each available corpus can be used to support the Madurese-Sundanese language pair. Experiments that have been carried out using the parallel corpus of the Indonesian-Madurese and Indonesian-Sundanese languages are equal to 5K and 6K sentences respectively, while the monolingual corpus used Malay, Sundanese and Indonesian each at 10K, 10K and 100K sentences. This study compares the results of applying the Triangulation and Transfer methods using Indonesian as a pivot language. The results of the research proved that the Triangulation method has better acceleration when compared to the Transfer method. From the results of the experiments conducted, the Triangulation method increased the average Indonesian pivot-based SMT testing by 6.18% for Madura-Sundanese SMT and 7.27% for Madurese-Sundanese SMT.
Despite the acknowledged importance of translation technology in translation studies programmes and the current ascendancy of Statistical Machine Translation (SMT), there has been little reflection ...to date on how SMT can or should be integrated into the translation studies curriculum. In a companion paper we set out a rationale for including a holistic SMT syllabus in the translation curriculum. In this paper, we show how the priorities and aspirations articulated in that source can be operationalised in the translation technology classroom and lab. We draw on our experience of designing and evaluating an SMT syllabus for a cohort of postgraduate student translators at Dublin City University in 2012. In particular, we report on data derived from a mixed-methods approach that aims to capture the students' view of the syllabus and their self-assessment of their own learning. Using the construct of self-efficacy, we show significant increases in students' knowledge of and confidence in using machine translation in general and SMT in particular, after completion of teaching units in SMT. We report on additional insights gleaned from student assignments, and conclude with ideas for future refinements of the syllabus.
Popular translators such as Google, Bing, etc., perform quite well when translating among the popular languages such as English, French, etc.; however, they make elementary mistakes when translating ...the low-resource languages such as Bengali, Arabic, etc. Google uses Neural Machine Translation (NMT) approach to build its multilingual translation system. Prior to NMT, Google used Statistical Machine Translation (SMT) approach. However, these approaches solely depend on the availability of a large parallel corpus of the translating language pairs. As a result, a good number of widely spoken languages such as Bengali, remain little explored in the research arena of artificial intelligence. Hence, the goal of this study is to explore improvized translation from Bengali to English. To do so, we study both the rule-based translator and the corpus-based machine translators (NMT and SMT) in isolation, and in combination with different approaches of blending between them. More specifically, first, we adopt popular corpus-based machine translators (NMT and SMT) and a rule-based machine translator for Bengali to English translation. Next, we integrate the rule-based translator with each of the corpus-based machine translators separately using different approaches. Besides, we perform rigorous experimentation over different datasets to report the best performance score for Bengali to English translation till today by revealing a comparison among the different approaches in terms of translation performance. Finally, we discuss how our different blending approaches can be re-used for other low-resource languages.
Larger n-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not ...convenient to obtain larger corpora in the same domain as the bilingual parallel corpora in SMT; 2) most of the previous studies focus on monolingual information from the target corpora only, and redundant n-grams have not been fully utilized in SMT. Nowadays, continuous-space language model (CSLM), especially neural network language model (NNLM), has been shown great improvement in the estimation accuracies of the probabilities for predicting the target words. However, most of these CSLM and NNLM approaches still consider monolingual information only or require additional corpus. In this paper, we propose a novel neural network based bilingual LM growing method. Compared to the existing approaches, the proposed method enables us to use bilingual parallel corpus for LM growing in SMT. The results show that our new method outperforms the existing approaches on both SMT performance and computational efficiency significantly.
When translating into morphologically rich languages, statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of ...morphologically richer language is less. Even though, we can use factored models to correctly generate morphological forms of words, the problem of data sparseness limits their performance. In this paper, we describe a simple and effective solution which is based on enriching the input corpora with various morphological forms of words. We use this method with the phrase-based and factor-based experiments on two morphologically rich languages: Hindi and Marathi when translating from English. We evaluate the performance of our experiments both in terms of automatic evaluation and subjective evaluation such as adequacy and fluency. We observe that the morphology injection method helps in improving the quality of translation. We further analyze that the morph injection method helps in handling the data sparseness problem to a great level.
Machine translation helps resolve language incomprehensibility issues and eases interaction among people from varying linguistic backgrounds. Although corpus-based approaches (statistical and neural) ...offer reasonable translation accuracy for large-sized corpus, robustness of such approaches lie in their ability to adapt to low-resource languages, which confront unavailability of large-sized corpus. In this paper, prediction aptness of two approaches has been meticulously explored in the context of Mizo, a low-resource Indian language. Translations predicted by the two approaches have been comparatively and adequately analyzed on a number of grounds to infer their strengths and weaknesses, particularly in low-resource scenarios.