Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few ...examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models ...have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The demand for customer support call centers has surged across various sectors due to the pandemic. Yet, the constraints of round-the-clock human services and fluctuating wait times pose challenges ...in fully meeting customer needs. In response, there’s a growing need for automated customer service systems that can provide responses tailored to specific domains and in the native languages of customers, particularly in developing nations like Uzbekistan where call center usage is on the rise. Our system, “UzAssistant,” is designed to recognize user voices and accurately present customer issues in standardized Uzbek, as well as vocalize the responses to voice queries. It employs feature extraction and recurrent neural network (RNN)-based models for effective automatic speech recognition, achieving an impressive 96.4% accuracy in real-time tests with 56 participants. Additionally, the system incorporates a sentence similarity assessment method and a text-to-speech (TTS) synthesis feature specifically for the Uzbek language. The TTS component utilizes the WaveNet architecture to convert text into speech in Uzbek.