Infodemiology is the process of mining unstructured and textual data so as to provide public health officials and policymakers with valuable information regarding public health. The appearance of ...this new data source, which was previously unimaginable, has opened up a new way in which to improve public health systems, resulting in better communication policies and better detection systems. However, the unstructured nature of the Internet, along with the complexity of the infectious disease domain, prevents the information extracted from being easily understood. Moreover, when dealing with languages other than English, for which some of the most common Natural Language Processing resources are not available, the correct exploitation of this data becomes even more difficult. We intend to fill these gaps proposing an ontology-driven aspect-based sentiment analysis with which to measure the general public’s opinions as regards infectious diseases when expressed in Spanish by employing a case study of tweets concerning the Zika, Dengue and Chikungunya viruses in Latin America. Our proposal is based on two technologies. We first use ontologies in order to model the infectious disease domain with concepts such as risks, symptoms, transmission methods or drugs, among other concepts. We then measure the relationship between these concepts in order to determine the degree to which one concept influences other concepts. This new information is subsequently applied in order to build an aspect-based sentiment analysis model based on statistical and linguistic features. This is done by applying deep-learning models. Our proposal is available on a web platform, where users can see the sentiment for each concept at a glance and analyse how each concept influences the sentiment of the others.
•Semantic relatedness expands the aspects for sentiment analysis.•Release a balanced gold-corpus regarding infectious diseases from Latin America.•Linguistic features outperform the accuracy of word-embeddings.•Cardinal numerals are a strong indicative of negative sentiment.
A social-semantic recommender system for advertisements García-Sánchez, Francisco; Colomo-Palacios, Ricardo; Valencia-García, Rafael
Information processing & management,
March 2020, 2020-03-00, 20200301, Letnik:
57, Številka:
2
Journal Article
Recenzirano
•Ads social recommenders challenged by sparsity, cold-start and heterogeneity.•Semantic Web technologies enable data integration and support recommendation.•Shared ontology model aligns ...advertisements with users’ profiles.•Textual contributions and network connections leveraged to improve recommendation.•Accuracy boosted adapting user profiles to changing needs.
Social applications foster the involvement of end users in Web content creation, as a result of which a new source of vast amounts of data about users and their likes and dislikes has become available. Having access to users’ contributions to social sites and gaining insights into the consumers’ needs is of the utmost importance for marketing decision making in general, and to advertisement recommendation in particular. By analyzing this information, advertisement recommendation systems can attain a better understanding of the users’ interests and preferences, thus allowing these solutions to provide more precise ad suggestions. However, in addition to the already complex challenges that hamper the performance of recommender systems (i.e., data sparsity, cold-start, diversity, accuracy and scalability), new issues that should be considered have also emerged from the need to deal with heterogeneous data gathered from disparate sources. The technologies surrounding Linked Data and the Semantic Web have proved effective for knowledge management and data integration. In this work, an ontology-based advertisement recommendation system that leverages the data produced by users in social networking sites is proposed, and this approach is substantiated by a shared ontology model with which to represent both users’ profiles and the content of advertisements. Both users and advertisement are represented by means of vectors generated using natural language processing techniques, which collect ontological entities from textual content. The ad recommender framework has been extensively validated in a simulated environment, obtaining an aggregated f-measure of 79.2% and a Mean Average Precision at 3 (MAP@3) of 85.6%.
Online social networks allow powerless people to gain enormous amounts of control over particular people’s lives and profit from the anonymity or social distance that the Internet provides in order ...to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great efforts have recently been made to identify misogyny, it is still difficult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not sufficient. Moreover, as Spanish is spoken worldwide, context and cultural differences can complicate this identification. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classified it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classification based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identification of misogyny. We have evaluated our proposal with three machine-learning classifiers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
•Social computing and sentiment analysis technologies can be applied to misogyny detection.•Performance of misogyny identification improves when combining linguistic features and word-embeddings.•Release of a balanced corpus in Spanish regarding misogyny.•Differences among dialects and cultural background in Spanish hinders misogyny identification.•Offensive language, grammatical gender, and grammatical errors and misspellings are discerning linguistic features.
► With the advent of the Social Web recommender systems are gaining momentum. ► Adding semantically empowered techniques to recommender systems can significantly improve the quality of ...recommendations. ► A hybrid recommender system based on knowledge and social networks is presented in this work. ► An evaluation in the cinematographic domain yields very promising results.
With the advent of the Social Web and the growing popularity of Web 2.0 applications, recommender systems are gaining momentum. The recommendations generated by these systems aim to provide end users with suggestions about information items, social elements, products or services that are likely to be of their interest. The traditional syntactic-based recommender systems suffer from a number of shortcomings that hamper their effectiveness. As semantic technologies mature, they provide a consistent and reliable basis for dealing with data at the knowledge level. Adding semantically empowered techniques to recommender systems can significantly improve the overall quality of recommendations. In this work, a hybrid recommender system based on knowledge and social networks is presented. Its evaluation in the cinematographic domain yields very promising results compared to state-of-the-art solutions.
In general, people are usually more reluctant to follow advice and directions from politicians who do not have their ideology. In extreme cases, people can be heavily biased in favour of a political ...party at the same time that they are in sharp disagreement with others, which may lead to irrational decision making and can put people’s lives at risk by ignoring certain recommendations from the authorities. Therefore, considering political ideology as a psychographic trait can improve political micro-targeting by helping public authorities and local governments to adopt better communication policies during crises. In this work, we explore the reliability of determining psychographic traits concerning political ideology. Our contribution is twofold. On the one hand, we release the PoliCorpus-2020, a dataset composed by Spanish politicians’ tweets posted in 2020. On the other hand, we conduct two authorship analysis tasks with the aforementioned dataset: an author profiling task to extract demographic and psychographic traits, and an authorship attribution task to determine the author of an anonymous text in the political domain. Both experiments are evaluated with several neural network architectures grounded on explainable linguistic features, statistical features, and state-of-the-art transformers. In addition, we test whether the neural network models can be transferred to detect the political ideology of citizens. Our results indicate that the linguistic features are good indicators for identifying fine-grained political affiliation, they boost the performance of neural network models when combined with embedding-based features, and they preserve relevant information when the models are tested with ordinary citizens. Besides, we found that lexical and morphosyntactic features are more effective on author profiling, whereas stylometric features are more effective in authorship attribution.
•Spanish PoliCorpus-2020 for conducting authorship analysis is released.•Linguistic features are effective for fine-grained political affiliation.•Embeddings and linguistic features are complementary in authorship analysis.•Lexical and morphosyntactic features are effective in author profiling.•Stylometric features are effective in authorship attribution.
•A novel semantic-based recommender system in the leisure domain is proposed.•The context-aware approach is based on location, time and crowd information.•Recommended items are viewed as composed ...items: movie theater+movie+showtime.•Good performance results were obtained under cold-start real world scenarios.
Recommender systems are used to provide filtered information from a large amount of elements. They provide personalized recommendations on products or services to users. The recommendations are intended to provide interesting elements to users. Recommender systems can be developed using different techniques and algorithms where the selection of these techniques depends on the area in which they will be applied. This paper proposes a recommender system in the leisure domain, specifically in the movie showtimes domain. The system proposed is called RecomMetz, and it is a context-aware mobile recommender system based on Semantic Web technologies. In detail, a domain ontology primarily serving a semantic similarity metric adjusted to the concept of “packages of single items” was developed in this research. In addition, location, crowd and time were considered as three different kinds of contextual information in RecomMetz. In a nutshell, RecomMetz has unique features: (1) the items to be recommended have a composite structure (movie theater+movie+showtime), (2) the integration of the time and crowd factors into a context-aware model, (3) the implementation of an ontology-based context modeling approach and (4) the development of a multi-platform native mobile user interface intended to leverage the hardware capabilities (sensors) of mobile devices. The evaluation results show the efficiency and effectiveness of the recommendation mechanism implemented by RecomMetz in both a cold-start scenario and a no cold-start scenario.
The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual ...orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.
In recent years, a substantial effort has been made to develop sophisticated methods that can be used to detect figurative language, and more specifically, irony and sarcasm. There is, however, an ...absence of new approaches and research works that analyze satirical texts. The recognition of satire by sentiment analysis and Natural Language Processing (NLP) applications is extremely important because it can influence and change the meaning of a statement in varied and complex ways. We used this understanding as a basis to propose a method that employs a wide variety of psycholinguistic features and which detects satirical and non-satirical text. We then went on to train a set of machine learning algorithms that would enable us to classify unknown data. Finally, we conducted several experiments in order to detect the most relevant features that generate a better pattern as regards detecting satirical texts. We evaluated the effectiveness of our method by obtaining a corpus of satirical and non-satirical news from Mexican and Spanish Twitter accounts. Our proposal obtained encouraging results, with an F-measure of 85.5% for Mexico and one of 84.0% for Spain. Moreover, the results of the experiment showed that there is no significant difference between Mexican and Spanish satire.
Hope speech detection in Spanish García-Baena, Daniel; García-Cumbreras, Miguel Ángel; Jiménez-Zafra, Salud María ...
Language resources and evaluation,
12/2023, Letnik:
57, Številka:
4
Journal Article
Recenzirano
Odprti dostop
In recent years, systems have been developed to monitor online content and remove abusive, offensive or hateful content. Comments in online social media have been analyzed to find and stop the spread ...of negativity using methods such as hate speech detection, identification of offensive language or detection of abusive language. We define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression. Detecting it automatically, in order to give greater diffusion to positive comments, can have a very significant effect when it comes to fighting against sexual or racial discrimination or when we intend to foster less bellicose environments. In this article we perform a complete study on hope speech, analyzing existing solutions and available resources. In addition, we have generated a quality resource, SpanishHopeEDI, a new Spanish Twitter dataset on LGBT community, and we have conducted some experiments that can serve as a baseline for further research.