L’intelligence artificielle est en train de redéfinir la traduction automatique et le traitement automatique des langues. il est ainsi urgent de mieux en saisir les enjeux sociaux, économiques et ...éthiques. Ce volume collectif explore la possibilité et les contours d’un nouveau consensus entre les usages humains des langues et la contribution des machines.
The language barrier is one of the practical challenges human being face during communication. To overcome this, researchers are focusing on using machines to translate a source language to a target ...language using the textual representations of the languages. Thus, machine translation (MT) could achieve a near human-level performance in terms of translation quality for several resource-rich languages. However, machine translation performance is still far from a production-level quality for the low resource languages. This work reports a semi-supervised neural machine translation system to boost the translation quality for an extremely resource constraint language pair, i.e. English–Manipuri. Our proposed approach exploits self-training and back-translation in a combined technique. The quantitative evaluation shows that the system performance improves by +0.9 BLEU score after introducing external noise to the input data. Additionally, a multi-reference test dataset developed in-house is used to evaluate the linguistic diversity of the highly agglutinative and morphologically rich Manipuri language. Experimental result attests that the proposed semi-supervised system outperforms the supervised, the pretrained mBART and existing semi-supervised baselines in terms of automatic score and subjective evaluation parameters by a significant margin up to +4.5 and +1.2 BLEU improvements against the supervised and mBART baselines respectively.
•Backtranslation and forward-translation improve the low resource machine translation.•External perturbations to the noisy synthetic data help in converging the model.•Linguistic variations are tackled via the inclusion of multiple test references.•The proposed method is competitive with pre-trained models.
Context-aware machine translation approaches improve the quality of translation by incorporating the context of the surrounding phrases in the translation of a phrase. So far, for the low-resource ...language pair English-Amharic, context-aware machine translation approaches have not been investigated in depth. Moreover, the current approaches for machine translation of the low-resource language pair English-Amharic usually require a large set of parallel corpus to achieve fluency. This research investigates a new approach that translates English text to Amharic text using a combination of context based machine translation (CBMT) and a recurrent neural network machine translation (RNNMT). We built a bilingual dictionary for the CBMT to use along with a target corpus. The RNNMT model is then provided with the output of the CBMT and a parallel corpus for training. The approach is evaluated using the New Testament Bible as a corpus. Our combinational approach on English–Amharic language pair yields a performance improvement over the simple neural machine translation (NMT), while no improvement is seen over CBMT for a small dataset. We have also assessed the impact of the dictionary used by CBMT on the overall performance of the approach. The result shows that the dictionary accuracy, and hence, the CBMT output is found to affect the combinational approach.
Language learning and translation have always been complementary pillars of multilingualism in the European Union. Both have been affected by the increasing availability of machine translation (MT): ...language learners now make use of free online MT to help them both understand and produce texts in a second language, but there are fears that uninformed use of the technology could undermine effective language learning. At the same time, MT is promoted as a technology that will change the face of professional translation, but the technical opacity of contemporary approaches, and the legal and ethical issues they raise, can make the participation of human translators in contemporary MT workflows particularly complicated. Against this background, this book attempts to promote teaching and learning about MT among a broad range of readers, including language learners, language teachers, trainee translators, translation teachers, and professional translators. It presents a rationale for learning about MT, and provides both a basic introduction to contemporary machine-learning based MT, and a more advanced discussion of neural MT. It explores the ethical issues that increased use of MT raises, and provides advice on its application in language learning. It also shows how users can make the most of MT through pre-editing, post-editing and customization of the technology.
The main challenge in the field of unsupervised machine translation (UMT) is to associate source-target sentences in the latent space. As people who speak different languages share biologically ...similar visual systems, various unsupervised multi-modal machine translation (UMMT) models have been proposed to improve the performances of UMT by employing visual contents in natural images to facilitate alignment. Commonly, relation information is the important semantic in a sentence. Compared with images, videos can better present the interactions between objects and the ways in which an object transforms over time. However, current state-of-the-art methods only explore scene-level or object-level information from images without explicitly modeling objects relation; thus, they are sensitive to spurious correlations, which poses a new challenge for UMMT models. In this paper, we employ a spatial-temporal graph obtained from videos to exploit object interactions in space and time for disambiguation purposes and to promote latent space alignment in UMMT. Our model employs multi-modal back-translation and features pseudo-visual pivoting, in which we learn a shared multilingual visual-semantic embedding space and incorporate visually pivoted captioning as additional weak supervision. Experimental results on the VATEX Translation 2020 and HowToWorld datasets validate the translation capabilities of our model on both sentence-level and word-level and generalizes well when videos are not available during the testing phase.
The problems in machine translation are related to the characteristics of a family of languages, especially syntactic divergences between languages. In the translation task, having both source and ...target languages in the same language family is a luxury that cannot be relied upon. The trained models for the task must overcome such differences either through manual augmentations or automatically inferred capacity built into the model design. In this work, we investigated the impact of multiple methods of differing word orders during translation and further experimented in assimilating the source languages syntax to the target word order using pre-ordering. We focused on the field of extremely low-resource scenarios. We also conducted experiments on practical data augmentation techniques that support the reordering capacity of the models through varying the target objectives, adding the secondary goal of removing noises or reordering broken input sequences. In particular, we propose methods to improve translation quality with the denoising autoencoder in Neural Machine Translation (NMT) and pre-ordering method in Phrase-based Statistical Machine Translation (PBSMT). The experiments with a number of English-Vietnamese pairs show the improvement in BLEU scores as compared to both the NMT and SMT systems.
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language ...processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications.