•Cross-linguistic difference in the gender-cue strength impacts predictive processing.•Fine-grained gender distinctions are processed differentially in related languages.•Economy, Transparency and ...Interdependence can account for differences in processing.
We tested predictive gender agreement processing in adjective–noun phrases by 45 4- to 6-year-old Russian- and Bulgarian-speaking children using the visual world eye-tracking paradigm. Russian and Bulgarian are closely related languages that have three genders but differ in the nature and number of gender cues on adjectives. Analysis of the proportion and time course of looks to the target noun showed that only Bulgarian children used gender cues to predict the upcoming noun. We argue that the cross-linguistic difference in the gender cue strength is revealed through the operation of economy, transparency, and interdependence in a gender complexity matrix. The documented advantage for Bulgarian children in gender agreement processing and acquisition underscores the need for a comparative language acquisition approach to typologically close languages.
The problem of a total absence of parallel data is present for a large number of language pairs and can severely detriment the quality of machine translation. We describe a language-independent ...method to enable machine translation between a low-resource language (LRL) and a third language, e.g. English. We deal with cases of LRLs for which there is no readily available parallel data between the low-resource language and any other language, but there is ample training data between a closely-related high-resource language (HRL) and the third language. We take advantage of the similarities between the HRL and the LRL in order to transform the HRL data into data similar to the LRL using transliteration. The transliteration models are trained on transliteration pairs extracted from Wikipedia article titles. Then, we automatically back-translate monolingual LRL data with the models trained on the transliterated HRL data and use the resulting parallel corpus to train our final models. Our method achieves significant improvements in translation quality, close to the results that can be achieved by a general purpose neural machine translation system trained on a significant amount of parallel data. Moreover, the method does not rely on the existence of any parallel data for training, but attempts to bootstrap already existing resources in a related language.
Lingua Receptiva: An Overview of Communication Strategies Majdańska-Wachowicz, Urszula; Steciąg, Magdalena; Zábranský, Lukáš
Forum lingwistyczne : studia, archiwalia, polemiki, varia,
06/2021, Volume:
8, Issue:
8
Journal Article
Peer reviewed
Open access
The aim of the study is to examine communication strategies employed by the Polish and Czech speakers when communicating with each other in their native languages. In particular, the analysis refers ...to receptive intercultural communication. The material under investigation covers audio and visual recordings of semi-spontaneous dialogues. The pragmalinguistic research investigates the strategies which help achieve mutual intelligibility when using lingua receptiva. The findings prove how significant pragmatic aspects are when it comes to successful receptive intercultural communication.
Abstract
We measured mutual intelligibility of 16 closely related spoken languages in Europe. Intelligibility was
determined for all 70 language combinations using the same uniform methodology (a ...cloze test). We analysed the results of 1833
listeners representing the mutual intelligibility between young, educated Europeans from the same 16 countries.
Lexical, phonological, orthographic, morphological and syntactic distances were computed as linguistic variables.
We also quantified non-linguistic variables (e.g. exposure, attitudes towards the test languages). Using stepwise regression
analysis the importance of linguistic and non-linguistic predictors for the mutual intelligibility in the 70 language pairs was
assessed.
Exposure to the test language was the most important variable, overriding all other variables. Then, limiting the
analysis to the prediction of inherent intelligibility, we analysed the results for a subset of listeners with no or little
previous exposure to the test language. Linguistic distances, especially lexical distance, now explain a substantial part of the
variance.
Abstract
This study investigates the differential effects of Textual Enhancement (TE) on the learning and unlearning of two syntactic properties of Spanish – the absence of the Pre-possessive ...Determiner Article (PPDA) and the presence of the Prepositional Accusative (PA) – which each pose specific acquisitional difficulties for Italian-speaking learners of Spanish (ISS) due to their asymmetrical relationships with corresponding L1 structures. 77 ISS were divided in two experimental groups: group A read 5 texts with TE on PA – the feature to be learned – and group B read the same 5 texts with TE on PPDA – the feature to be unlearned. The participants took a timed grammatical judgment task three times (before, five days after, and two months after the instructional treatment). The results are compared with those of Della Putta (2016), a symmetrical study to this, in which the same teaching intervention and experimental conditions were adopted with Spanish-speaking learners of Italian, whose task was to unlearn PA and to learn PPDA. The bidirectional comparison shows a similar, weak effect of TE, although in the present study, unlike in Della Putta (2016), unlearning did not seem to be more difficult than learning. These similarities and differences are discussed and theoretically motivated.
Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful ...for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as and parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.
Indonesia has a diverse ethnic and cultural background. However, this diversity sometimes creates social problems, such as intertribal conflict. Because of the large differences among tribal ...languages, it is often difficult for conflicting parties to dialog for conflict resolution. To address this problem, we aim to find intermediary closely related languages from a language similarity knowledge graph using the best-performing pathfinding algorithms. In this research, we analyze the performances of two pathfinding algorithms, namely, Dijkstra and Yen’s K, by comparing their execution time and the total lexical distances of the intermediary languages (called “the cost”). Our research findings show that even though the Dijkstra and Yen’s K algorithms have equal total cost for all the cases, Yen’s K outperformed Dijkstra at searching for intermediary languages that are closely related, with an average of 160% higher performance on execution time. The selection of native speakers of the obtained intermediary languages as mediators is formalized as an optimization problem with four criteria: language similarity, geographical distance, background, and expected salary. We present a case study where the intermediary closely related languages can be used as a guideline to find mediators who can help resolve the intertribal conflicts among Indonesian tribes. To calculate the first criteria, we implemented the Yen’s K algorithm to calculate the shortest path between target languages and return the path via the intermediary languages. This implementation shows the potential use of the mediator selection model defined in this paper in various other roles such as trader or salesman, politician’s spokesman, reporter or journalist, etc.
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been ...proven useful for inducing bilingual lexicons for such languages. We propose constraint-based bilingual lexicon induction for closely related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycle to reach many more cognates in the transgraph. We further identify cognate synonyms to obtain many-to-many translation pairs. This article utilizes four datasets: one Austronesian low-resource language and three Indo-European high-resource languages. We use three constraint-based methods from our previous work, the Inverse Consultation method and translation pairs generated from Cartesian product of input dictionaries as baselines. We evaluate our result using the metrics of precision, recall, and F-score. Our customizable approach allows the user to conduct cross validation to predict the optimal hyperparameters (cognate threshold and cognate synonym threshold) with various combination of heuristics and number of symmetry assumption cycles to gain the highest F-score. Our proposed methods have statistically significant improvement of precision and F-score compared to our previous constraint-based methods. The results show that our method demonstrates the potential to complement other bilingual dictionary creation methods like word alignment models using parallel corpora for high-resource languages while well handling low-resource languages.
In this article, the authors tackle the problem of discriminating Twitter users by the language they tweet in, taking into account very similar South-Slavic languages -- Bosnian, Croatian, ...Montenegrin and Serbian. They apply the supervised machine learning approach by annotating a subset of 500 users from an existing Twitter collection by the language the users primarily tweet in. They show that by using a simple bag-of-words model, univariate feature selection, 320 strongest features and a standard classifier, they reach user classification accuracy of ∼98%. Annotating the whole 63,160 users strong Twitter collection with the best performing classifier and visualizing it on a map via tweet geo-information, we produce a Twitter language map which clearly depicts the robustness of the classifier.
This study investigates the differential effects of Textual Enhancement (TE) on the learning and unlearning of two syntactic properties of Spanish – the absence of the Pre-possessive Determiner ...Article (PPDA) and the presence of the Prepositional Accusative (PA) – which each pose specific acquisitional difficulties for Italian-speaking learners of Spanish (ISS) due to their asymmetrical relationships with corresponding L1 structures. 77 ISS were divided in two experimental groups: group A read 5 texts with TE on PA – the feature to be learned – and group B read the same 5 texts with TE on PPDA – the feature to be unlearned. The participants took a timed grammatical judgment task three times (before, five days after, and two months after the instructional treatment). The results are compared with those of Della Putta (2016), a symmetrical study to this, in which the same teaching intervention and experimental conditions were adopted with Spanish-speaking learners of Italian, whose task was to unlearn PA and to learn PPDA. The bidirectional comparison shows a similar, weak effect of TE, although in the present study, unlike in Della Putta (2016), unlearning did not seem to be more difficult than learning. These similarities and differences are discussed and theoretically motivated.