This research highlights the importance of the Arabic lexicon in teaching and studying ta’bîr courses in Ma’had al-Imarat Bandung. This qualitative research depends on a field study and a descriptive ...analysis. This importance is due to three aspects: The first is the ta’bîr courses of materials prescribed in any Arabic language learning institutes to develop students’ four skills. The second is that lexicons are tools for teaching and learning the Arabic language, especially in the subject matter of ta’bîr. The third is that the Arabic lexicon is one of the primary sources people refer to correct their linguistic knowledge. The results indicate that language acquisition is the most important of that knowledge. Furthermore, in acquiring the Arabic language, many factors are linguistic and non-linguistic. Dictionaries are considered one of the primary tools in language acquisition in studying the subject matter of ta’bîr.
We present a simple and effective methodology for the generation of lexicons (word lists) that may be used in natural language scoring applications. In particular, in the finance industry, word lists ...have become ubiquitous for sentiment scoring. These have been derived from dictionaries such as the Harvard Inquirer and require manual curation. Here, we present an automated approach to the curation of lexicons, which makes automatic preparation of any word list immediate. We show that our automated word lists deliver comparable performance to traditional lexicons on machine learning classification tasks. This new approach will enable finance academics and practitioners to create and deploy new word lists in addition to the few traditional ones in a facile manner.
•A LinearSVM model was able to extract e-mail sentiment with a mean AUC of 0.896.•The model could also predict sentiment for e-mail responses with a mean AUC of 0.805.•The results suggests ...possibilities for improved customer support mangement processes.
Customer support is important to corporate operations, which involves dealing with disgruntled customer and content customers that can have different requirements. As such, it is important to quickly extract the sentiment of support errands. In this study we investigate sentiment analysis in customer support for a large Swedish Telecom corporation. The data set consists of 168,010 e-mails divided into 69,900 conversation threads without any sentiment information available. Therefore, VADER sentiment is used together with a Swedish sentiment lexicon in order to provide initial labeling of the e-mails. The e-mail content and sentiment labels are then used to train two Support Vector Machine models in extracting/classifying the sentiment of e-mails. Further, the ability to predict sentiment of not-yet-seen e-mail responses is investigated. Experimental results show that the LinearSVM model was able to extract sentiment with a mean F1-score of 0.834 and mean AUC of 0.896. Moreover, the LinearSVM algorithm was also able to predict the sentiment of an e-mail one step ahead in the thread (based on the text in the an already sent e-mail) with a mean F1-score of 0.688 and the mean AUC of 0.805. The results indicate a predictable pattern in e-mail conversation that enables predicting the sentiment of a not-yet-seen e-mail. This can be used e.g. to prepare particular actions for customers that are likely to have a negative response. It can also provide feedback on possible sentiment reactions to customer support e-mails.
Due to the massive amount of data being generated on the platform, Twitter has been the subject of numerous sentiment analysis studies. Such social network services generate massive unstructured data ...streams which make working with them very challenging. The aim of this study is to reliably analyze the sentiment of trending tweets in the Twitter API data stream using a combination of different algorithms to achieve a consensus. The methods we implemented include Support-Vector Machine, Naive Bayes, Textblob, and Lexicon Approach. The hypothesis is that using these methods together would enable us to get more accurate results. Using a labeled dataset to test our model, the results show that the combination of these four algorithms all together performed best with an overall accuracy of 68.29%. We conclude that our combination method of analysis is suitable and fast enough for our data stream and also accurate for analyzing sentiment.
•Domain specific emotion lexicon (DSEL) is proposed for emotion feature extraction.•Novel feature extraction methods are introduced to exploit knowledge of our DSEL.•The proposed features ...significantly improve emotion classification performance.
General Purpose Emotion Lexicons (GPELs) that associate words with emotion categories remain a valuable resource for emotion analysis of text. However the static and formal nature of their vocabularies make them inadequate for extracting effective features for document representation, in domains that are inherently dynamic in nature (e.g. Social Media). This calls for lexicons that are not only adaptive to the lexical variations in a domain but also provide finer-grained quantitative estimates to accurately capture word-emotion associations. In this paper we extend prior work on domain specific emotion lexicon (DSEL) generation and apply it for emotion feature extraction. We demonstrate how our generative unigram mixture model (UMM) based DSEL learnt by harnessing labelled (blogs, news headlines and incident reports) and weakly-labelled (tweets) emotion text can be used to extract effective features for emotion classification. Our results confirm that the features derived using the proposed lexicon outperform those from state-of-the-art lexicons learnt using supervised Latent Dirichlet Allocation (sLDA) and Point-Wise Mutual Information (PMI). Further the proposed lexicon features also outperform state-of-the-art features derived using a combination of n-grams, part-of-speech information and sentiment lexicons.
•A novel approach for multilingual sentimental analysis.•The uniqueness of research is to use languages i.e. Urdu, English, and Roman-Urdu.•A novel dictionary on multilingual sentiments.•Extreme ...lexicons that have intensity weights are created and used to label dataset.
Uncertainty in political, religious, and social issues causes extremism among people that are depicted by their sentiments on social media. Although, English is the most common language used to share views on social media, however, other vicinity based languages are also used by locals. Thus, it is also required to incorporate the views in such languages along with widely used languages for revealing better insights from data. This research focuses on the sentimental analysis of social media multilingual textual data to discover the intensity of the sentiments of extremism. Our study classifies the incorporated textual views into any of four categories, including high extreme, low extreme, moderate, and neutral, based on their level of extremism. Initially, a multilingual lexicon with the intensity weights is created. This lexicon is validated from domain experts and it attains 88% accuracy for validation. Subsequently, Multinomial Naïve Bayes and Linear Support Vector Classifier algorithms are employed for classification purposes. Overall, on the underlying multilingual dataset, Linear Support Vector Classifier out-performs with an accuracy of 82%.
We present a new approach for content analysis to quantify document tone. We find a significant relation between our measure of the tone of 10-Ks and market reaction for both negative and positive ...words. We also find that the appropriate choice of term weighting in content analysis is at least as important as, and perhaps more important than, a complete and accurate compilation of the word list. Furthermore, we show that our approach circumvents the need to subjectively partition words into positive and negative word lists. Our approach reliably quantifies the tone of IPO prospectuses as well, and we find that the document score is negatively related to IPO underpricing.