•Problem: end-to-end speech translation requires large corpora to train neural models.•Contribution: MuST-C is a large multilingual corpus built from English TED Talks.•Corpus content: English ...speech, aligned transcription/translations in 14 languages.•Other key features: high topic and speaker variety, large size, free distribution.•Discussion: empirical/manual quality evaluation, baseline results on all languages.
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancement of sequence to sequence learning in its two parent tasks: automatic speech recognition (ASR) and machine translation (MT). However, research in the field has to confront with the scarcity of publicly available corpora to train data-hungry neural networks. Indeed, while traditional cascade solutions can build on sizable ASR and MT training data for a variety of languages, the available SLT corpora suitable for end-to-end training are few, typically small and of limited language coverage. We contribute to fill this gap by presenting MuST-C, a large and freely available Multilingual Speech Translation Corpus built from English TED Talks. Its unique features include: i) language coverage and diversity (from English into 14 languages from different families), ii) size (at least 237 hours of transcribed recordings per language, 430 on average), iii) variety of topics and speakers, and iv) data quality. Besides describing the corpus creation methodology and discussing the outcomes of empirical and manual quality evaluations, we present baseline results computed with strong systems on each language direction covered by MuST-C.
Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present ...several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.
Brain oscillations are prevalent in all species and are involved in numerous perceptual operations. α oscillations are thought to facilitate processing through the inhibition of task-irrelevant ...networks, while β oscillations are linked to the putative reactivation of content representations. Can the proposed functional role of α and β oscillations be generalized from low-level operations to higher-level cognitive processes? Here we address this question focusing on naturalistic spoken language comprehension. Twenty-two (18 female) Dutch native speakers listened to stories in Dutch and French while MEG was recorded. We used dependency parsing to identify three dependency states at each word: the number of (1) newly opened dependencies, (2) dependencies that remained open, and (3) resolved dependencies. We then constructed forward models to predict α and β power from the dependency features. Results showed that dependency features predict α and β power in language-related regions beyond low-level linguistic features. Left temporal, fundamental language regions are involved in language comprehension in α, while frontal and parietal, higher-order language regions, and motor regions are involved in β. Critically, α- and β-band dynamics seem to subserve language comprehension tapping into syntactic structure building and semantic composition by providing low-level mechanistic operations for inhibition and reactivation processes. Because of the temporal similarity of the α-β responses, their potential functional dissociation remains to be elucidated. Overall, this study sheds light on the role of α and β oscillations during naturalistic spoken language comprehension, providing evidence for the generalizability of these dynamics from perceptual to complex linguistic processes.
It remains unclear whether the proposed functional role of α and β oscillations in perceptual and motor function is generalizable to higher-level cognitive processes, such as spoken language comprehension. We found that syntactic features predict α and β power in language-related regions beyond low-level linguistic features when listening to naturalistic speech in a known language. We offer experimental findings that integrate a neuroscientific framework on the role of brain oscillations as "building blocks" with spoken language comprehension. This supports the view of a domain-general role of oscillations across the hierarchy of cognitive functions, from low-level sensory operations to abstract linguistic processes.
Éditorial – L’anglais oral Terrier, Linda
Recherche et pratiques pédagogiques en langues de spécialité,
2021, Letnik:
40, Številka:
1
Journal Article
Odprti dostop
C’est avec beaucoup d’honneur et un grand plaisir que Recherche et pratiques pédagogiques en langues de spécialité - Les cahiers de l’Apliut accueille pour la première fois dans ses pages un numéro ...coordonné par l’Aloes, l’association des anglicistes pour les études de langue orale dans l’enseignement supérieur, secondaire et primaire. L’idée de cette collaboration a émergé en 2017, lors des journées d’études de l’Aloes organisées par Susan Moore Mauroux à l’université de Limoges sur le thème...