Abstract
Automated character identification in movies and TV series has been typically carried out through face detection in video and the association of faces with characters’ names extracted from ...dialogues or cast lists. We propose a deep learning architecture to identify characters based on subtitles only, precisely through the lexicon those characters employ. The identification task is formalized as a multi-class classification task. We apply our technique to the complete set of episodes in the Gomorrah TV series and achieve an average identification accuracy beyond 94 per cent on the full set of characters.
To improve the performance of word-of-mouth sentiment classification, this article reevaluates objective sentiment words in the SentiWordNet sentiment lexicon.
Attention models are proposed in sentiment analysis and other classification tasks because some words are more important than others to train the attention models. However, most existing methods ...either use local context based information, affective lexicons, or user preference information. In this work, we propose a novel attention model trained by cognition grounded eye-tracking data. First,a reading prediction model is built using eye-tracking data as dependent data and other features in the context as independent data. The predicted reading time is then used to build a cognition grounded attention layer for neural sentiment analysis. Our model can capture attentions in context both in terms of words at sentence level as well as sentences at document level. Other attention mechanisms can also be incorporated together to capture other aspects of attentions, such as local attention, and affective lexicons. Results of our work include two parts. The first part compares our proposed cognition ground attention model with other state-of-the-art sentiment analysis models. The second part compares our model with an attention model based on other lexicon based sentiment resources. Evaluations show that sentiment analysis using cognition grounded attention model outperforms the state-of-the-art sentiment analysis methods significantly. Comparisons to affective lexicons also indicate that using cognition grounded eye-tracking data has advantages over other sentiment resources by considering both word information and context information. This work brings insight to how cognition grounded data can be integrated into natural language processing (NLP) tasks.
Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that ...distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.
We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.
Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.
The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.
Complex networks are often used to analyze written text and reports by rendering texts in the form of a semantic network, forming a lexicon of words or key terms. Many existing methods to construct ...lexicons are based on counting word co-occurrences, having the advantage of simplicity and ease of applicability. Here, we use a quantum semantics approach to generalize such methods, allowing us to model the entanglement of terms and words. We show how quantum semantics can be applied to reveal disciplinary differences in the use of key terms by analyzing 12 scholarly texts that represent the different positions of various disciplinary schools (of conceptual change research) on the same topic (conceptual change). In addition, attention is paid to how closely the lexicons corresponding to different positions can be brought into agreement by suitable tuning of the entanglement factors. In comparing the lexicons, we invoke complex network-based analysis based on exponential matrix transformation and use information theoretic relative entropy (Jensen–Shannon divergence) as the operationalization of differences between lexicons. The results suggest that quantum semantics is a viable way to model the disciplinary differences of lexicons and how they can be tuned for a better agreement.
One of the biggest issues of Indian economy in 2017 was the implementation of Goods and Services Tax (GST), and the social networks witnessed a lot of opinion contrasts and conflicts regarding this ...new taxation system. Inspired by such a large-scale tax reformation, we developed an experimental approach to analyze the reactions of public sentiment on Twitter based on popular words either directly or indirectly related to GST. We collected a number of almost 200 k tweets solely about GST from June 2017 to December 2017 in two phases. In order to assure the relevance of our crawled tweets with respect to GST, we prepared a topic-sentiment relevance model. Furthermore, we employed several state-of-the-art lexicons for identifying sentiment words and assigned polarity ratings to each of the tweets. On the other hand, in order to extract the relevant words that are linked with GST implicitly, we propose a new polarity-popularity framework and such popular words were also rated with sentiments. Next, we trained an LSTM model using both types of rated words for predicting sentiment on GST tweets and obtained an overall accuracy of 84.51%. It was observed that the performance of the system has been started improving while incorporating the knowledge of indirectly related GST words during training.
A context-aware approach based on machine learning and lexical analysis identifies ambiguous terms and stores them in contextualized sentiment lexicons, which ground the terms to concepts ...corresponding to their polarity.
Simple Knowledge Organization System (SKOS) provides a data model and vocabulary for expressing Knowledge Organization Systems (KOSs) such as thesauri and classification schemes in Semantic Web ...applications. This paper presents the main components of SKOS and their formal expression in Web Ontology Language (OWL), providing an extensive account of the design decisions taken by the Semantic Web Deployment (SWD) Working Group of the World Wide Web Consortium (W3C), which between 2006 and 2009 brought SKOS to the status of W3C Recommendation. The paper explains key design principles such as “minimal ontological commitment” and systematically cites the requirements and issues that influenced the design of SKOS components.
By reconstructing the discussion around alternative features and design options and presenting the rationale for design decisions, the paper aims at providing insight into how SKOS turned out as it did, and why. Assuming that SKOS, like any other successful technology, may eventually be subject to revision and improvement, the critical account offered here may help future editors approach such a task with deeper understanding.
Rich online consumer reviews (OCR) can be mined to gain valuable insights, beneficial for both brands and future buyers. Recently, aspect based sentiment classification have shown excellent results ...for fine grained sentiment analysis of OCR. However, there are only few studies so far that rely on both explicitly deriving sentiment using syntactic features, and capturing implicit contextual word relations for the task of aspect based sentiment classification. In this paper, we propose a novel method: Hybrid Attribute Based Sentiment Classification (HABSC) with the aim to derive sentiment orientation of OCR by capturing implicit word relations and incorporating domain specific knowledge. First, we detect the most frequent bigrams and trigrams in the corpus, followed by POS tagging to retain aspect descriptions and opinion words. Then, we employ TFIDF (term frequency inverse document frequency) to represent each document, followed by automatically extracting optimal number of topics in the given corpus. All the adjectives and adverbs are labelled using domain specific knowledge and pre-existing lexicons. Lastly, we find sentiment orientation of each review under the assumption that each review is a mixture of weighted and sentiment labelled attributes. We test the efficiency of our method using datasets from two different domains: hotel reviews from TripAdvisor.com and mobile phone reviews from Amazon.com. Results show that, the classification accuracy of HABSC significantly exceeds various state-of-the-art methods including aspect-based sentiment classification and supervised classification using distributed word and paragraph vectors. Our method also exhibits less computational time as compared to distributed vectorization schemes.