Very deep transformers outperform conventional bi-directional long short-term memory networks for automatic speech recognition (ASR) by a significant margin. However, being autoregressive models, ...their computational complexity is still a prohibitive factor in their deployment into production systems. To amend this problem, we study two different non-autoregressive transformer structures for ASR: Audio-Conditional Masked Language Model (A-CMLM) and Audio-Factorized Masked Language Model (A-FMLM). When training these frameworks, the decoder input tokens are randomly replaced by special mask tokens. Then, the network is optimized to predict the masked tokens by taking both the unmasked context tokens and the input speech into consideration. During inference, we start from all masked tokens and the network iteratively predicts missing tokens based on partial results. A new decoding strategy is proposed as an example, which starts from the most confident predictions to the rest. Results on Mandarin (AISHELL), Japanese (CSJ), English (LibriSpeech) benchmarks show promising results to train such a non-autoregressive network for ASR. Especially in AISHELL, the proposed method outperformed the Kaldi ASR system and matched the performance of the state-of-the-art autoregressive transformer with <inline-formula><tex-math notation="LaTeX">7\times</tex-math></inline-formula> speedup.
Successful applications of deep learning technologies in the natural language processing domain have improved text-based intent classifications. However, in practical spoken dialogue applications, ...the users' articulation styles and background noises cause automatic speech recognition (ASR) errors, and these may lead language models to misclassify users' intents. To overcome the limited performance of the intent classification task in the spoken dialogue system, we propose a novel approach that jointly uses both recognized text obtained by the ASR model and a given labeled text. In the evaluation phase, only the fine-tuned recognized language model (RLM) is used. The experimental results show that the proposed scheme is effective at classifying intents in the spoken dialogue system containing ASR errors.
On models and modelling Hundt, Marianne
World Englishes,
September 2021, 2021-09-00, 20210901, Letnik:
40, Številka:
3
Journal Article
Recenzirano
World Englishes (WE) research has been invested in getting to grips with the diversity of different Englishes and in making sense of their structural properties. The first research strand led to a ...proliferation of theoretical models, the second to comparative research relying increasingly on sophisticated statistical modelling. The connection between these research strands is not always as clear as we might wish, and occasionally even rather tenuous. This paper revisits existing models of WEs with a view to their predictive power and reviews recent corpus‐based studies with respect to the ways that these have tried to operationalise predictions of theoretical models. It uses a mental model to understand exactly why it is apparently so difficult to bring intricate, quantitative modelling of usage data to bear on theoretical modelling.
Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students’ written responses at scale. Scalable methods for scoring written responses ...are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written‐response assessments. The purpose of this study is to describe and evaluate three active learning methods that can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern Automated Essay Scoring system. The three active learning methods are the uncertainty‐based, the topological‐based, and the hybrid method. These three methods were used to select essays included in the Automated Student Assessment Prize competition that were then classified using a scoring model that was trained with the bidirectional encoder representations from a transformer language model. All three active learning methods produced strong results, with the topological‐based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden layers. DNNs are becoming popular in automatic speech recognition tasks which combines a good acoustic with a ...language model. Standard feedforward neural networks cannot handle speech data well since they do not have a way to feed information from a later layer back to an earlier layer. Thus, Recurrent Neural Networks (RNNs) have been introduced to take temporal dependencies into account. However, the shortcoming of RNNs is that long-term dependencies due to the vanishing/exploding gradient problem cannot be handled. Therefore, Long Short-Term Memory (LSTM) networks were introduced, which are a special case of RNNs, that takes long-term dependencies in a speech in addition to short-term dependencies into account. Similarily, GRU (Gated Recurrent Unit) networks are an improvement of LSTM networks also taking long-term dependencies into consideration. Thus, in this paper, we evaluate RNN, LSTM, and GRU to compare their performances on a reduced TED-LIUM speech data set. The results show that LSTM achieves the best word error rates, however, the GRU optimization is faster while achieving word error rates close to LSTM.
Speech processing for under-resourced languages is an active field of research, which has experienced significant progress during the past decade. We propose, in this paper, a survey that focuses on ...automatic speech recognition (ASR) for these languages. The definition of under-resourced languages and the challenges associated to them are first defined. The main part of the paper is a literature review of the recent (last 8years) contributions made in ASR for under-resourced languages. Examples of past projects and future trends when dealing with under-resourced languages are also presented. We believe that this paper will be a good starting point for anyone interested to initiate research in (or operational development of) ASR for one or several under-resourced languages. It should be clear, however, that many of the issues and approaches presented here, apply to speech technology in general (text-to-speech synthesis for instance).
•We explain in detail the different steps in computing a language model based on a recurrent neural network.•We survey the applications and findings based on the current literature.•We survey the ...methods for reducing computational complexity.
In this paper, we present a survey on the application of recurrent neural networks to the task of statistical language modeling. Although it has been shown that these models obtain good performance on this task, often superior to other state-of-the-art techniques, they suffer from some important drawbacks, including a very long training time and limitations on the number of context words that can be taken into account in practice. Recent extensions to recurrent neural network models have been developed in an attempt to address these drawbacks. This paper gives an overview of the most important extensions. Each technique is described and its performance on statistical language modeling, as described in the existing literature, is discussed. Our structured overview makes it possible to detect the most promising techniques in the field of recurrent neural networks, applied to language modeling, but it also highlights the techniques for which further research is required.
Creativity is highly valued in both education and the workforce, but assessing and developing creativity can be difficult without psychometrically robust and affordable tools. The open-ended nature ...of creativity assessments has made them difficult to score, expensive, often imprecise, and therefore impractical for school- or district-wide use. To address this challenge, we developed and validated the Measure of Original Thinking for Elementary School (MOTES) in five phases, including the development of the item pool and test instructions, expert validation, cognitive pilots, and validation of the automated scoring and latent test structure. MOTES consists of three game-like computerized activities (uses, examples, and sentences subscales), with eight items in each for a total of 24 items. Using large language modeling techniques, MOTES is scored for originality by our open-access artificial intelligence platform with a high level of agreement with independent subjective human ratings across all three subscales at the response level ( rs = .79, .91, and .85 for uses, examples, and sentences, respectively). Confirmatory factor analyses showed a good fit with three factors corresponding to each game, subsumed under a higher-order originality factor. Internal consistency reliability was strong for both the subscales ( H = 0.82, 0.85, and 0.88 for uses, examples, and sentences, respectively) and the higher-order originality factor ( H = 0.89). MOTES scores showed moderate positive correlations with external creative performance indicators as well as academic achievement. The implications of these findings are discussed in relation to the challenges of assessing creativity in schools and research. (PsycInfo Database Record (c) 2024 APA, all rights reserved) (Source: journal abstract)
Automatic speech recognition: a survey Malik, Mishaim; Malik, Muhammad Kamran; Mehmood, Khawar ...
Multimedia tools and applications,
03/2021, Letnik:
80, Številka:
6
Journal Article
Recenzirano
Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between ...cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.
Minds Backus, Ad; Cohen, Michael; Cohn, Neil ...
Linguistics in the Netherlands,
11/2023, Letnik:
40, Številka:
1
Journal Article
Recenzirano
The advent of large language models (LLMs) like GPT-4 has raised fundamental questions about language and its nature, such as whether artificial systems are able to "use" language in a similar way to ...humans. The role of linguistics in the development of these technologies has been surprisingly limited, but linguists can pick up a much larger role in these discussions clarifying how LLMs could be adapted to become more similar to the language we use as humans. This paper contends that linguistic models and representations should centralize MINDS: Multimodality, Interoperability, Nonopacity, Diversity, and Sociality as the authors argue that these aspects of human language constitute the main challenges to linguistics as a social science and that elucidating them would require a concerted effort from the field itself, but also from affiliated domains such as philosophy, anthropology, sociology, and psychology.