Résumé
Ce texte introductif présente le contexte dans lequel s’inscrit la recherche sur la phraséologie des interactions orales. Les dix articles qui composent le volume sont également synthétisés. ...Ces derniers regroupent des réflexions méthodologiques et des études de cas.
This article provides an overview of methodological and technical issues that arise in the collection, indexing and use of spoken learner corpora, i. e. corpora containing spoken utterances of ...learners of a target language. After an introductory discussion of the most important special features of this type of corpus that distinguish it from written language learner corpora and spoken corpora with L1 speakers, we will go into more detail on questions of corpus design. The main part of the paper is then an overview of the methodological and technical procedures of the individual steps of collecting, indexing, providing and using spoken learner corpora. The main aim of this overview is to highlight practices that can be considered best practices according to the current state of research. Finally, we outline the challenges that still exist for this type of corpus.
•Problem: end-to-end speech translation requires large corpora to train neural models.•Contribution: MuST-C is a large multilingual corpus built from English TED Talks.•Corpus content: English ...speech, aligned transcription/translations in 14 languages.•Other key features: high topic and speaker variety, large size, free distribution.•Discussion: empirical/manual quality evaluation, baseline results on all languages.
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancement of sequence to sequence learning in its two parent tasks: automatic speech recognition (ASR) and machine translation (MT). However, research in the field has to confront with the scarcity of publicly available corpora to train data-hungry neural networks. Indeed, while traditional cascade solutions can build on sizable ASR and MT training data for a variety of languages, the available SLT corpora suitable for end-to-end training are few, typically small and of limited language coverage. We contribute to fill this gap by presenting MuST-C, a large and freely available Multilingual Speech Translation Corpus built from English TED Talks. Its unique features include: i) language coverage and diversity (from English into 14 languages from different families), ii) size (at least 237 hours of transcribed recordings per language, 430 on average), iii) variety of topics and speakers, and iv) data quality. Besides describing the corpus creation methodology and discussing the outcomes of empirical and manual quality evaluations, we present baseline results computed with strong systems on each language direction covered by MuST-C.
Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present ...several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.