Este trabajo se inserta en el marco del proyecto VITALEX y en él ofrecemos los datos relativos al léxico agrícola pertenecientes al punto Gr601 que se corresponde conel pueblo de Trevélez. Realizamos ...un estudio de vitalidad léxica de dicho municipio por medio de una metodología contrastiva donde comparamos los datos de VITALEX con los del tomo I del ALEA. Además, podemos observar una relación interesante entre la transformación económica y social de la comarca de La Alpujarra con los porcentajes de vitalidad y mortandad del léxico agrícola. Se hace contrastando los resultados obtenidos por las tres generaciones que configuran VITALEX en relación con los datos del ALEA.
In this piece, we honor the work of Albert Costa. His work focused on how bilinguals manage two languages, the brain mechanisms involved, and the ways in which language and emotion are related. We ...end by discussing ways in which his work will frame research in the field going forward.
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited ...and small emotion lexicons. In this paper, we show how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help to identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help to obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion‐annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher interannotator agreement than that obtained by asking if a term evokes an emotion.
DispoCen es un sistema para el análisis de la disponibilidad y la centralidad léxica. Aunque existen programas específicos para el cálculo de los citados índices, estos suelen restringir en exceso ...las posibilidades de análisis y explotación de los datos, bien porque se trata de herramientas obsoletas, bien porque sus códigos son excesivamente cerrados e inaccesibles. DispoCen está basado en una librería de herramientas en R que pone al alcance de quienes estudian el léxico el desarrollo de múltiples aplicaciones y modelos originales. En este trabajo hemos incluido los códigos necesarios para ejecutar los análisis, con lo que potenciamos la necesaria replicabilidad que favorece el trabajo autónomo de la comunidad investigadora. Para facilitar el acceso al sistema, también presentamos una sencilla utilidad gráfica que permite el acceso a los análisis más usuales. Como muestra de las posibilidades de DispoCen, incluimos un apartado específico con propuestas de análisis realizadas con filtros sociológicos.
Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest ...sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.
From a very young age, monolingual children assume their language has no synonyms, or use the principle of mutual exclusivity (only one label per object). In contrast, bilingual children often accept ...more novel synonyms than monolinguals. One possible explanation for this difference is the lexicon structure hypothesis: having synonyms (across languages) in the lexicon reduces adherence to mutual exclusivity. The purpose of this study is to test the lexicon structure hypothesis by comparing three- to five-year-old children who speak either Canadian French or English. Canadian French allows more synonyms than English. French-speaking children should therefore accept more novel synonyms than English-speaking children. The children did a disambiguation task, choosing whether a familiar or an unfamiliar object was the referent of a novel word (e.g., moli). Surprisingly, the French-speaking children accepted significantly fewer novel synonyms than English-speaking children. However, they accepted the most synonyms for objects that had synonyms in French but they did not know both synonyms. These results support a modified version of the lexicon structure hypothesis, one that accounts for children’s weak access to synonyms.
Previous studies report that exposure to the Māori language on a regular basis allows New Zealand adults who cannot speak Māori to build a proto‐lexicon of Māori—an implicit memory of word forms ...without detailed knowledge of meaning. How might this knowledge feed into explicit language learning? Is it possible to “awaken” the proto‐lexicon in the context of overt language learning? We investigate whether implicit linguistic knowledge represented in a proto‐lexicon gives any advantages for intentional language learning in a tertiary educational environment. We conducted a three‐task experiment which: (a) assessed participants’ Māori proto‐lexicon, (b) assessed their phonotactic knowledge, and (c) tested them on Māori vocabulary that they had been exposed to during the course at two time points. The results show that students with larger Māori proto‐lexicons learn more words in a classroom setting. This study shows that proto‐lexicon acquired from ambient exposure can lead to significant benefits in language learning.
A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org
Upaya melakukan analisis emosi pada teks komentar mahasiswa dalam evaluasi pembelajaran sangat penting dilakukan. Komentar dalam kuesener umumnya tidak diolah, padahal data tersebut mengandung ...informasi dalam mengungkap emosi mahasiswa dalam proses pembelajaran. Untuk itu deteksi dan klasifikasi emosi pada opini mahasiswa dapat memperbaiki hasil kuesioner. Penelitian ini bertujuan menerapkan metode klasifikasi emosi pada teks komentar mahasiswa berbasis pada leksikon emosi dari NRC Emolex. Jenis emosi yang akan dideteksi adalah 8 jenis emosi, yaitu marah (Anger), antisipasi (anticipation), jijik (disgust), takut (fear), bahagia (joy), sedih (sadness), terkejut (surprise) dan yakin (trust) . Data diambil dari komentar dan saran mahasiswa pada kuesioner pada Universitas AKPRIND Indonesia tahun 2020-2022 sebanyak 4000 data yang telah dilabeli secara manual. Tujuan lain dari studi ini adalah melihat sejauh mana efektivitas leksikon emosi Emolex untuk klasifikasi emosi teks kuesioner akademis. Hasil penelitian menunjukkan rata-rata akurasi sebesar 56,2%. Dari yang diketahui label emosinya 3 prosentase tertinggi ada pada label Sadness (19,2%), Joy(16,7%) dan Fear (13,5%) yang masing-masing memiliki akurasi 72%, 68% dan 68%. Dari penelitian terungkap bahwa kinerja Emolex untuk klasifikasi emosi masih kurang memuaskan dan memerlukan pengembangan leksikon lebih jauh lagi.
Riassunto: Il presente contributo nasce nell’ambito della ricerca condotta per il progetto, diretto da Anna Radaelli e finanziato dalla “Sapienza”, Università di Roma: «Itinerari testuali nel ...Medioevo mediterraneo. La tradizione delle scritture di viaggio e dell’immaginario figurativo nell’orizzonte occitano-catalano». Nello specifico, vengono presentati i primi risultati derivanti dai sondaggi linguistici effettuati sull’Atlante catalano, (ms. Parigi, BnF, Esp. 30), in particolare sul lessico. La pluralità di fonti a cui attinge tale opera, infatti, si riflette e traspare nella mescidanza lessicale che la caratterizza. Da un punto di vista prettamente testuale, l’Atlante presenta uno stile medio ma mai sciatto, con una differenza tra la prosa della compilazione cosmografico-astronomica (trasmessa nelle tavole 1-2) e quella delle legende (copiate nelle tavole 3-6). Nella prima sezione si nota l’uso di un lessico di derivazione colta, in cui si rilevano hapax e calchi dal latino, e termini tecnici e di ambito settoriale; di contro, nelle legende e nei toponimi il lessico risente anche degli influssi delle lingue parlate in quel tempo nel bacino del Mediterraneo, tra cui l’arabo. Parole chiave: Atlante catalano, testo, lessicoAbstract: This contribution arises from the research conducted as part of the project, directed by Anna Radaelli and financed by “Sapienza”, University of Rome: «Itinerari testuali nel Medioevo mediterraneo. La tradizione delle scritture di viaggio e dell’immaginario figurativo nell’orizzonte occitano-catalano». Specifically, it presents the first results of the linguistic surveys carried out on the Catalan Atlas, (ms. Paris, BnF, Esp. 30), particularly on the lexicon. The plurality of sources from which this work draws is reflected in the lexical mixture that characterises it. From a purely textual point of view, the Atlas shows an average but never sloppy style and a difference between the prose of the cosmographic-astronomical compilation (conveyed in Tables 1-2) and that of the captions (copied in Tables 3-6). In the first section, the use of a cultured lexicon can be noticed, with hapaxes and casts from Latin, and technical and sectorial terms; on the other hand, in the captions and the toponyms the lexicon is also influenced by the languages spoken at the time in the Mediterranean basin, including Arabic. Keywords: Catalan Atlas, text, lexicon