Este trabajo se inserta en el marco del proyecto VITALEX y en él ofrecemos los datos relativos al léxico agrícola pertenecientes al punto Gr601 que se corresponde conel pueblo de Trevélez. Realizamos ...un estudio de vitalidad léxica de dicho municipio por medio de una metodología contrastiva donde comparamos los datos de VITALEX con los del tomo I del ALEA. Además, podemos observar una relación interesante entre la transformación económica y social de la comarca de La Alpujarra con los porcentajes de vitalidad y mortandad del léxico agrícola. Se hace contrastando los resultados obtenidos por las tres generaciones que configuran VITALEX en relación con los datos del ALEA.
In this piece, we honor the work of Albert Costa. His work focused on how bilinguals manage two languages, the brain mechanisms involved, and the ways in which language and emotion are related. We ...end by discussing ways in which his work will frame research in the field going forward.
From a very young age, monolingual children assume their language has no synonyms, or use the principle of mutual exclusivity (only one label per object). In contrast, bilingual children often accept ...more novel synonyms than monolinguals. One possible explanation for this difference is the lexicon structure hypothesis: having synonyms (across languages) in the lexicon reduces adherence to mutual exclusivity. The purpose of this study is to test the lexicon structure hypothesis by comparing three- to five-year-old children who speak either Canadian French or English. Canadian French allows more synonyms than English. French-speaking children should therefore accept more novel synonyms than English-speaking children. The children did a disambiguation task, choosing whether a familiar or an unfamiliar object was the referent of a novel word (e.g., moli). Surprisingly, the French-speaking children accepted significantly fewer novel synonyms than English-speaking children. However, they accepted the most synonyms for objects that had synonyms in French but they did not know both synonyms. These results support a modified version of the lexicon structure hypothesis, one that accounts for children’s weak access to synonyms.
Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest ...sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited ...and small emotion lexicons. In this paper, we show how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help to identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help to obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion‐annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher interannotator agreement than that obtained by asking if a term evokes an emotion.
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches ...in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
Este trabajo se centra en el tratamiento de la variación léxica en el ámbito de la enseñanza del español como segunda lengua. En él se revisan oposiciones diatópicas del léxico hispánico y se ensaya ...una nueva metodología de aproximación al fenómeno de la representatividad. El objetivo último del análisis es proponer una renovación y actualización de las Nociones específicas del Plan Curricular del Instituto Cervantes (PCIC) incluyendo: (a) panhispanismos, (b) americanismos y (c) españolismos. El trabajo propone un nuevo índice de representatividad léxica y analiza los resultados del cálculo, comparándolos con los de modelos anteriores. Para la presentación de resultados se incluye una herramienta de acceso gratuito con mapas interactivos de geosinónimos. El estudio aporta una perspectiva lingüística plural, policéntrica y ecolingüística al inventario léxico del PCIC y compensa la restricción que supone describir el vocabulario únicamente a partir de la norma centro-norte peninsular española.
This paper argues that there are lexical items that conventionally express the idea of dividing one quantity by another, and per is one of them. In particular, the proposal is that there are three ...ratio-related senses of per: (i) a quotient function; (ii) a quotient operator; and (iii) quotient of measure functions. The ratio-based approach, which is built up here in order to handle a wider range of data than previous ratio-based approaches could, is contrasted with an opposing view, one on which per is a distributivity marker like each. Four types of evidence are used: (i) cases involving measurement of an object or an event whose measure is smaller than the unit given by per’s complement; (ii) uses in the differential argument of a comparative; (iii) uses modifying a measure function noun; and and (iv) uses modifying a gradable predicate. All of these are problematic for a distributivity- marker analysis, and support the idea that per expresses the concept of ratio. Along the way, we gain diagnostics for whether a given item conventionally expresses the concept of a ratio in a given language.
Social media has become the largest data source of public opinion. The application of sentiment analysis to social media texts has great potential, but faces great challenges because of domain ...heterogeneity. Sentiment orientation of words varies by content domain, but learning context-specific sentiment in social media domains continues to be a major challenge. The language domain poses another challenge since the language used in social media today differs significantly from that used in traditional media. To address these challenges, we propose a method to adapt existing sentiment lexicons for domain-specific sentiment classification using an unannotated corpus and a dictionary. We evaluate our method using two large developing corpora, containing 743,069 tweets related to the stock market and one million tweets related to political topics, respectively, and five existing sentiment lexicons as seeds and baselines. The results demonstrate the usefulness of our method, showing significant improvement in sentiment classification performance.
•We propose a method to adapt existing sentiment lexicons for domain-specific sentiment classification.•The proposed method addresses challenges from both content domain and language domain.•We evaluate our method using two large developing corpora and five existing sentiment lexicons as seeds and baselines.•The evaluation results demonstrate the usefulness of our method.