This book is a theoretically oriented, comparative study of noun phrases and their semantic and morpho-syntactic properties. This is the first study that provides a comprehensive analysis of the ...nominal structure in Uzbek, and compares it with corresponding structures in other article and article-less languages. Uzbek nominals represent a fertile ground to test the universality of the DP hypothesis and to make an insightful contribution to an ongoing debate about the functional architecture of the nominal domain in languages with and without articles. The study shows that the ordering of various nominal suffixes in Uzbek reflects a rich functional structure, involving not only DP but also KP. The work also discusses elements such as determiners, demonstratives, quantifiers and adjectives, and positioning of these elements within the nominal domain. This study is especially useful for researchers interested in theoretical linguistics, comparative syntax and typology.
In the evidential system of Uzbek, the speaker has different grammatical options in marking the source of information, such as -ibdi, ekan, emish, etc., although it is not compulsory to mark this ...category in the utterance. In addition to these established markers, new markers have developed into evidentials, and they encode specific sub-categories of evidentiality. In this study, after a brief overview of grammatical markers of evidentiality in Uzbek, the marker chog'i is examined with a syntactic and semantic approach based on a corpus of selected texts. Its development into an inferential marker is evaluated with special attention to sources of evidentials.
This article deals with intellectual analyzing technologies, which classify texts in Uzbek language, in which the Bernoulli and multi-nominal models are considered. The textual documents used in this ...research are from the authentic sources of The State National Information Agency of Uzbekistan. To compare the probability methods of classification, 600 documents of 6 types of categories, with 169205 words, have been used.
Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few ...examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
This paper presents a dataset and approaches to named entity recognition (NLP) in Uzbek language, in a resource-constrained language environment. Despite the increase in NLP applications, the Uzbek ...language is still underrepresented, which underscores the importance of our work. Our dataset includes 1,160 sentences with nearly 19,000 word forms annotated for parts of speech and named entities, making it a valuable resource for linguistic research and machine learning applications in Uzbek. In addition, for practical application and experiments, the authors have developed two algorithms that, using this dictionary, identifies named entities in Uzbek language texts. In addition, the authors described the methodology for creating the dataset, the design of the algorithms, and their application to the Uzbek language. This study not only provides an important dataset for future named entity recognition(NER) tasks in the Uzbek language, but also offers a methodological basis for the use of vocabulary-based NER or Machine learning NER in other low-resource languages (e.g. Karakalpak). The dataset (and algorithms) we have developed can be used to create applications such as improved chatbot systems, text mining applications and other analytical tools for the Uzbek language, contributing to the development of those areas in the region for which these solutions will be developed.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
In this study we address ethnolinguistic identity using Bakhtin's (1981) notion of chronotope. Taking an ethnographic approach to linguistic data from Azerbaijani and Uzbek communities, we trace the ...impact of various chronotopes on our participants’ acts of ethnolinguistic identification. Building on Blommaert & De Fina (2017), we illustrate how ethnolinguistic identification is an outcome of the interaction between multiple levels of large- and small-scale chronotopes. Furthermore, we argue that chronotopes differ in terms of their power, depending on the ideological force behind them. We demonstrate how power differentials between chronotopes can account for certain interactional and linguistic patterns in conversation. The power inherent in chronotopes that link nationhood with specific languages makes the notions of discrete languages and static identities ‘real’ for our participants. Therefore, discussions of language and identity as flexible and socially constructed, we argue, must not obscure the power of these notions in shaping the perceptions of sociolinguistic subjects. (Chronotope, ethnolinguistic identity, power, Uzbek, Azeri/Azerbaijani, nationalism, language mixing, language ideology)*
Full text
Available for:
BFBNIB, INZLJ, NMLJ, NUK, ODKLJ, PNG, UL, UM, UPUK, ZRSKP
The theoretical status of differential object marking (DOM) has given rise to numerous debates. In this paper we examine data from a set of languages with DOM (Uzbek, Hindi-Urdu, Estonian, Finnish), ...showing that previous theories addressing the problem of object licensing in DOM languages are insufficient to account for the facts. The complex morpho-syntactic behavior of direct objects in these languages provides further support to an account according to which DOM does not simply signal the difference between syntactically licensed objects, which are marked, and unlicensed ones, which are unmarked. Rather, DOM signals an additional licensing operation beyond that of structural licensing in terms of (uninterpretable) Case (following Irimia 2020, 2021, 2022).
Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models ...have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
This article explores the issue of functional-semantic and associative-grammatical expression of the meaning of the whole and part, group and part under the influence of the concept of "person" in ...the Uzbek language through the category agreement.