Este trabajo se inserta en el marco del proyecto VITALEX y en él ofrecemos los datos relativos al léxico agrícola pertenecientes al punto Gr601 que se corresponde conel pueblo de Trevélez. Realizamos ...un estudio de vitalidad léxica de dicho municipio por medio de una metodología contrastiva donde comparamos los datos de VITALEX con los del tomo I del ALEA. Además, podemos observar una relación interesante entre la transformación económica y social de la comarca de La Alpujarra con los porcentajes de vitalidad y mortandad del léxico agrícola. Se hace contrastando los resultados obtenidos por las tres generaciones que configuran VITALEX en relación con los datos del ALEA.
In this piece, we honor the work of Albert Costa. His work focused on how bilinguals manage two languages, the brain mechanisms involved, and the ways in which language and emotion are related. We ...end by discussing ways in which his work will frame research in the field going forward.
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited ...and small emotion lexicons. In this paper, we show how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help to identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help to obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion‐annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher interannotator agreement than that obtained by asking if a term evokes an emotion.
DispoCen es un sistema para el análisis de la disponibilidad y la centralidad léxica. Aunque existen programas específicos para el cálculo de los citados índices, estos suelen restringir en exceso ...las posibilidades de análisis y explotación de los datos, bien porque se trata de herramientas obsoletas, bien porque sus códigos son excesivamente cerrados e inaccesibles. DispoCen está basado en una librería de herramientas en R que pone al alcance de quienes estudian el léxico el desarrollo de múltiples aplicaciones y modelos originales. En este trabajo hemos incluido los códigos necesarios para ejecutar los análisis, con lo que potenciamos la necesaria replicabilidad que favorece el trabajo autónomo de la comunidad investigadora. Para facilitar el acceso al sistema, también presentamos una sencilla utilidad gráfica que permite el acceso a los análisis más usuales. Como muestra de las posibilidades de DispoCen, incluimos un apartado específico con propuestas de análisis realizadas con filtros sociológicos.
From a very young age, monolingual children assume their language has no synonyms, or use the principle of mutual exclusivity (only one label per object). In contrast, bilingual children often accept ...more novel synonyms than monolinguals. One possible explanation for this difference is the lexicon structure hypothesis: having synonyms (across languages) in the lexicon reduces adherence to mutual exclusivity. The purpose of this study is to test the lexicon structure hypothesis by comparing three- to five-year-old children who speak either Canadian French or English. Canadian French allows more synonyms than English. French-speaking children should therefore accept more novel synonyms than English-speaking children. The children did a disambiguation task, choosing whether a familiar or an unfamiliar object was the referent of a novel word (e.g., moli). Surprisingly, the French-speaking children accepted significantly fewer novel synonyms than English-speaking children. However, they accepted the most synonyms for objects that had synonyms in French but they did not know both synonyms. These results support a modified version of the lexicon structure hypothesis, one that accounts for children’s weak access to synonyms.
Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest ...sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches ...in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
Este trabajo se centra en el tratamiento de la variación léxica en el ámbito de la enseñanza del español como segunda lengua. En él se revisan oposiciones diatópicas del léxico hispánico y se ensaya ...una nueva metodología de aproximación al fenómeno de la representatividad. El objetivo último del análisis es proponer una renovación y actualización de las Nociones específicas del Plan Curricular del Instituto Cervantes (PCIC) incluyendo: (a) panhispanismos, (b) americanismos y (c) españolismos. El trabajo propone un nuevo índice de representatividad léxica y analiza los resultados del cálculo, comparándolos con los de modelos anteriores. Para la presentación de resultados se incluye una herramienta de acceso gratuito con mapas interactivos de geosinónimos. El estudio aporta una perspectiva lingüística plural, policéntrica y ecolingüística al inventario léxico del PCIC y compensa la restricción que supone describir el vocabulario únicamente a partir de la norma centro-norte peninsular española.