The study of affective language has had numerous developments in the Natural Language Processing area in recent years, but the focus has been predominantly on Sentiment Analysis, an expression ...usually used to refer to the classification of texts according to their polarity or valence (positive vs. negative). The study of emotions, such as joy, sadness, anger, surprise, among others, has been much less developed and has fewer resources, both for English and for other languages, such as Spanish. In this paper, we present the most relevant existing resources for the study of emotions, mainly for Spanish; we describe some heuristics for the union of two existing corpora of Spanish tweets; and based on some experiments for classification of tweets according to seven categories (anger, disgust, fear, joy, sadness, surprise, and others) we analyze the most problematic classes.
We present the results of HAHA at IberLEF 2021: Humor Analysis based on Human Annotation. This year's edition of the competition includes the two classic tasks of humor detection and rating, plus two ...novel tasks of humor logic mechanism and target classification. We describe the corpus created for the challenge, the competition phases, the submitted systems and the main results obtained.
We present different methods for Sentiment analysis in Spanish tweets: SVM based on word embeddings centroid for the tweet, CNN and LSTM. We analyze the results obtained using the corpora from the ...TASS sentiment analysis challenge, obtaining state of the art results in the performance of the classifiers. As the neutral category is the hardest one to classify, we focus in understanding the neutral tweets classification problems and we further analyze the composition of this class in order to extract insights on how to improve the classifiers.
Nowadays, many approaches for Sentiment Analysis (SA) rely on affective lexicons to identify emotions transmitted in opinions. However, most of these lexicons do not consider that a word can express ...different sentiments in different predication domains, introducing errors in the sentiment inference. Due to this problem, we present a model based on a context-graph which can be used for building domain specic sentiment lexicons(DL: Dynamic Lexicons) by propagating the valence of a few seed words. For different corpora, we compare the results of a simple rule-based sentiment classier using the corresponding DL, with the results obtained using a general affective lexicon. For most corpora containing specic domain opinions, the DL reaches better results than the general lexicon.
We present the results of the QuALES task, which addresses the problem of Extractive Question Answering from texts. For both training and evaluation we use the QuALES corpus, a corpus of Uruguayan ...media news about the Covid-19 pandemic and related topics. We describe the systems developed by seven participants, all of them based on different BERT-like language models. The best results were obtained using the multilingual RoBERTa model pre-trained with SQUAD-Es-V2, with a fine tuning on the QuALES corpus.
In this paper, we introduce a framework for processing genetics and genomics literature, based on ontologies and lexical resources from the biomedical domain. The main objective is to support the ...diagnosis process that is done by medical geneticists who extract knowledge from published works. We constructed a pipeline that gathers several genetics- and genomics-related resources and applies natural language processing techniques, which include named entity recognition and relation extraction. Working on a corpus created from PubMed abstracts, we built a knowledge database that can be used for processing medical records written in Spanish. Given a medical record from Uruguayan healthcare patients, we show how we can map it to the database and perform graph queries for relevant knowledge paths. The framework is not an end user application, but an extensible processing structure to be leveraged by external applications, enabling software developers to streamline incorporation of the extracted knowledge.
•Novel relations between terms belonging to the Genetics domain can be inferred from research literature.•Extracted knowledge is presented to the user, usually an expert in Medical Genetics, for consideration.•Natural language medical records can be automatically linked to inferred relations.•Analysis and mapping of medical records can be performed in different languages.
This paper describes the design and implementation of a system that takes Spanish texts and generates crosswords (board and definitions) in a fully automatic way using definitions extracted from ...those texts. Our solution divides the problem in two parts: a definition extraction module that applies pattern matching implemented in Python, and a crossword generation module that uses a greedy strategy implemented in Prolog. The system achieves 73% precision and builds crosswords similar to those built by humans.
The gender gap between man and women participation in Science, Technology, Engineering and Mathematics (STEM) is regrettably universal, and generally unacceptably broad. In addition, this gap is ...particularly noticeable in the Electrical Engineering and Computer Engineering (EECS) careers. Several international organizations and universities in North America, Europe and Latin America have designed programs to address this important problem, showing varying degrees of success. In many of these programs the idea is to work with high school girls, seeking to bring them key knowledge of the disciplines of STEM and encourage them to choose careers in the area. Among other activities, these programs offer presentations, talks, or short courses in a given period at the university itself, taught by women teachers in the area applying the role model approach. This article presents the experience of the Facultad de Ingenier ??a (School of
Engineering) of the Universidad de la Rep ?ublica, Uruguay, on the occasion of the Girls in Information and Communication Technologies (ICT) day. In particular, workshops for robotics, circuits and maps making were held for high school girls as a way to promote ICT careers in Uruguay.
Ce travail présente une étude linguistique des expressions d'opinions issues de différentes sources dans des textes en espagnol. Le travail comprend la définition d'un modèle pour les prédicats ...d'opinion et leurs arguments (la source, le sujet et le message), la création d'un lexique de prédicats d'opinions auxquels sont associées des informations provenant du modèle et la réalisation de trois systèmes informatiques.Le premier système, basé sur des règles contextuelles, obtient de bons résultats pour le score de F-mesure partielle: prédicat, 92%; source, 81%; sujet, 75%; message, 89%, opinion, 85%. En outre, l'identification de la source donne une valeur de 79% de F-mesure exacte. Le deuxième système, basé sur le modèle Conditional Random Fields (CRF), a été développé uniquement pour l'identification des sources, donnant une valeur de 76% de F-mesure exacte. Le troisième système, qui combine les deux techniques (règles et CRF), donne une valeur de 83% de F-mesure exacte, montrant ainsi que la combinaison permet d'obtenir des résultats intéressants.En ce qui concerne l'identification des sources, notre système, comparé à des travaux réalisés sur des corpus d'autres langues que l'espagnol, donne des résultats très satisfaisants. En effet ces différents travaux obtiennent des scores qui se situent entre 63% et 89,5%.Par ailleurs, en sus des systèmes réalisés pour l'identification de l'opinion, notre travail a débouché sur la construction de plusieurs ressources pour l'espagnol : un lexique de prédicats d'opinions, un corpus de 13000 mots avec des annotations sur les opinions et un corpus de 40000 mots avec des annotations sur les prédicats d'opinion et les sources.
This work presents a study of linguistic expressions of opinion from different sources in Spanish texts. The work includes the definition of a model for opinion predicates and their arguments (source, topic and message), the creation of a lexicon of opinion predicates which have information from the model associated, and the implementation of three systems.The first system, based on contextual rules, gets good results for the F-measure score (partial match): predicate, 92%; source, 81%; topic, 75%; message, 89%; full opinion, 85%. In addition, for source identification the F-measure for exact match is 79%. The second system, based on Conditional Random Fields (CRF), was developed only for the identification of sources, giving 76% of F-measure (exact match). The third system, which combines the two techniques (rules and CRF), gives a value of 83% of F-measure (exact match), showing that the combination yields interesting results.As regards the identification of sources, our system compared to other work developed for languages other than Spanish, gives very satisfactory results. Indeed these works had scores that fall between 63% and 89.5%.Moreover, in addition to the systems made for the identification of opinions, our work has led to the construction of several resources for Spanish: a lexicon of opinion predicates, a 13,000 words corpus with opinions annotated and a 40,000 words corpus with opinion predicates end sources annotated.