The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption ...stems from the fact that words considered to be “close” in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage. We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins.
•We demonstrate that semantic relationships can be effective for lexicon expansion.•We propose a novel method that explores distances between word embeddings.•Our unsupervised method enhances lexicon coverage while keeping high precision.•In our experiments, we beat all lexicon-expansion baselines by large margins.•Our approach is competitive with pre-trained transformers (BERT) without any training.
Sentiment analysis of text data, such as reviews, can help users and merchants make more favorable decisions. It is difficult to use the popular supervised learning method to complete the sentiment ...classification task because marking data manually is time-consuming and laborious. Unsupervised sentiment classification methods are mostly based on sentiment lexicons. The existing sentiment lexicons are simply not capable of domain sentiment classification, it still requires to construct a domain sentiment lexicon. There are still many problems with the advanced domain sentiment lexicon construction methods, e.g., rely heavily on labeled data, poor accuracy. We propose a labeled data extension idea to reduce the dependence of supervised learning methods on labeled data. In order to solve the problems of domain sentiment lexicon construction, we proposed a novel framework based on multi-source information fusion (MSIF) for learning. We extracted four kinds of emotional information, which are lexicon emotional information, emotional word co-occurrence information, emotional word polarity information and polarity relationship information of emotional word pair. When extracting the co-occurrence information, a novel method based on the data extension idea is proposed to enhance its accuracy and coverage. In order to accelerate the solution of the fusion model, an optimization method based on the ADMM algorithm is applied. Experimental results on five Amazon product review datasets show that the sentiment dictionary constructed by the proposed method can significantly improve the performance of review sentiment classification compared with the current popular baseline and the state-of-the-art methods.
Within the last couple of years, Sentiment Analysis in Arabic has gained a considerable interest from the research community. In this respect, the objective of this paper is to provide a review of ...the major works that have dealt with this research area in this language. A thorough investigation of the available literature revealed that the works were mainly concentrated on dealing with specific Sentiment Analysis tasks. To this end, they used three different approaches, namely supervised, unsupervised and hybrid. The results that these studies achieved are interesting but divergent. This divergence is relatively due to the type of approach opted for, the task that is being analysed as well as to the specificities and intricacies of the Arabic variety understudy.
During difficult tasks, conflict can benefit performance on a subsequent trial. One theory for such performance adjustments is that people monitor for conflict and reactively engage cognitive ...control. This hypothesis has been challenged because tasks that control for associative learning do not show such cognitive control effects. The current study experimentally controlled associative learning by presenting a novel stimulus on every trial of a picture-speech conflict task and found that performance adjustments still occur. Thirty-one healthy young adults listened to and repeated words presented in background noise while viewing pictures that were congruent or incongruent (i.e., phonological neighbors) with the word. Following conflict, participants had higher word recognition (+17% points) on incongruent but not congruent trials. This result was not attributable to posterror effects nor a speed-accuracy trade-off. An analysis of erroneous responses showed that participants made more phonologically related errors than nonrelatcd errors only on incongruent trials, demonstrating elevated phonological conflict when the picture was a neighbor of the target word. Additionally, postconflict improvements appear to be due to better resolution of phonological conflict in the mental lexicon rather than decreased attention to the picture or increased attention to the speech signal. Our findings provide new evidence for conflict monitoring and suggest that cognitive control helps resolve phonological conflict during speech recognition
The examination of how words are learned can offer valuable insights into the nature of lexical representations. For example, a common assessment of novel word learning is based on its ability to ...interfere with other words; given that words are known to compete with each other (
Luce and Pisoni, 1998
;
Dahan et al., 2001
), we can use the capacity of a novel word to interfere with the activation of other lexical representations as a measure of the degree to which it is integrated into the mental lexicon (
Leach and Samuel, 2007
). This measure allows us to assess novel word learning in L1 or L2, but also the degree to which representations from the two lexica interact with each other (
Marian and Spivey, 2003
). Despite the somewhat independent lines of research on L1 and L2 word learning, common patterns emerge across the two literatures (
Lindsay and Gaskell, 2010
;
Palma and Titone, 2020
). In both cases, lexicalization appears to follow a similar trajectory. In L1, newly encoded words often fail at first to engage in competition with known words, but they do so later, after they have been better integrated into the mental lexicon (
Gaskell and Dumay, 2003
;
Dumay and Gaskell, 2012
;
Bakker et al., 2014
). Similarly, L2 words generally have a facilitatory effect, which can, however, become inhibitory in the case of more robust (high-frequency) lexical representations. Despite the similar pattern, L1 lexicalization is described in terms of inter-lexical connections (
Leach and Samuel, 2007
), leading to more automatic processing (
McMurray et al., 2016
); whereas in L2 word learning, lack of lexical inhibition is attributed to less robust (i.e., fuzzy) L2 lexical representations. Here, I point to these similarities and I use them to argue that a common mechanism may underlie similar patterns across the two literatures.
This paper provides an in-depth investigation of the possibility of systematically using flexemes – i.e., lexical units characterized in terms of form, as opposed to lexemes, characterized in terms ...of meaning – to model overabundance – i.e., the availability of more than one form in the same paradigm cell. The starting point is a preliminary evaluation of the advantages and disadvantages of using flexemes to account for different overabundance phenomena, showing that flexemes are a good way to capture the systematicity of overabundance, either across lexemes or across cells. Consequently, it is suggested that flexemes can be an interesting technical solution for the creation of a lexicon of Latin verbs that not only documents all the competing wordforms available as principal parts, but also captures the systematic relationship that sometimes holds between variants filling different cells. A principled method to identify such systematicity is then described in detail. It is argued that a constructive approach based on the identity of stems and/or inflection class is not fully adequate for the data at hand. Therefore, the proposed procedure adopts an abstractive, word-based perspective that only relies on alternation patterns between unsegmented wordforms. Practical and theoretical implications of the work are finally discussed, particularly regarding the usefulness of a formal approach to the identification of lexical units and paradigm cells.
Although a growing number of second language acquisition (SLA) studies take linguistic complexity as a dependent variable, the term is still poorly defined and often used with different meanings, ...thus posing serious problems for research synthesis and knowledge accumulation. This article proposes a simple, coherent view of the construct, which is defined in a purely structural way, i.e. the complexity directly arising from the number of linguistic elements and their interrelationships. Issues of cognitive cost (difficulty) or developmental dynamics (acquisition) are explicitly excluded from this theoretical definition and its operationalization. The article discusses how the complexity of an interlanguage system can be assessed based on the limited samples with which SLA researchers usually work. For the areas of morphology, syntax and the lexicon, some measures are proposed that are coherent with the purely structural view advocated, and issues related to their operationalization are critically scrutinized.
Detecting sentiment of sentences in online reviews is still a challenging task. Traditional machine learning methods often use bag-of-words representations which cannot properly capture complex ...linguistic phenomena in sentiment analysis. Recently, recursive autoencoder (RAE) methods have been proposed for sentence-level sentiment analysis. They use word embedding to represent each word, and learn compositional vector representation of phrases and sentences with recursive autoencoders. Although RAE methods outperform other state-of-the-art sentiment prediction approaches on commonly used datasets, they tend to generate very deep parse trees, and need a large amount of labeled data for each node during the process of learning compositional vector representations. Furthermore, RAE methods mainly combine adjacent words in sequence with a greedy strategy, which make capturing semantic relations between distant words difficult. To solve these issues, we propose a semi-supervised method which combines HowNet lexicon to train phrase recursive autoencoders (we call it CHL-PRAE). CHL-PRAE constructs the phrase recursive autoencoder (PRAE) model at first. Then the model calculates the sentiment orientation of each node with the HowNet lexicon, which acts as sentiment labels, when we train the softmax classifier of PRAE. Furthermore, our CHL-PRAE model conducts bidirectional training to capture global information. Compared with RAE and some supervised methods such as support vector machine (SVM) and naïve Bayesian on English and Chinese datasets, the experiment results show that CHL-PRAE can provide the best performance for sentence-level sentiment analysis.
In past phonology literature, diacritics, brackets and other extra-phonological objects have been employed to identify morpheme boundaries and to differentiate words from affixes. In the present ...study, we argue that all of these extra-phonological items bring arbitrariness to phonological theory since phonology is only concerned with identifying phonological objects. In this respect, the present study proposes a new account in order to identify word boundaries in phonology and to explain phonological processes which show sensitivity to morphological boundaries without referring to any extra-phonological objects. Accordingly, we argue for a novel template model and propose that bases (e.g. words), productive suffixes and prefixes are listed in the lexicon with their own unique phonological templates, ONO, NO and ON, respectively. This means that their morphological categories are recognizable when they come to phonology. Also, the morphological boundness of prefixes and suffixes is visible in our template model: The absence of a final onset in prefixes (ON) and the absence of an initial onset in suffixes (NO) render them phonologically bound to a base because only bases exhibit an (O...O) structure. Accordingly, these morphemes come to phonology with their own templates, and phonological operations (government, licensing, etc.) apply to them when necessary. Also, we put forward two new parameters, the Final Onset Parameter and the Initial Onset Parameter, in order to explain the word-final/word-initial differences among languages. We argue that phonological processes, the phonology-morphology interface and their relation to the lexicon are non-arbitrarily explainable in our model thanks to the templates and novel parameters.