Sentiment analysis in Arabic: A review of the literature Boudad, Naaima; Faizi, Rdouan; Oulad Haj Thami, Rachid ...
Ain Shams Engineering Journal,
December 2018, 2018-12-00, 2018-12-01, Letnik:
9, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Within the last couple of years, Sentiment Analysis in Arabic has gained a considerable interest from the research community. In this respect, the objective of this paper is to provide a review of ...the major works that have dealt with this research area in this language. A thorough investigation of the available literature revealed that the works were mainly concentrated on dealing with specific Sentiment Analysis tasks. To this end, they used three different approaches, namely supervised, unsupervised and hybrid. The results that these studies achieved are interesting but divergent. This divergence is relatively due to the type of approach opted for, the task that is being analysed as well as to the specificities and intricacies of the Arabic variety understudy.
During difficult tasks, conflict can benefit performance on a subsequent trial. One theory for such performance adjustments is that people monitor for conflict and reactively engage cognitive ...control. This hypothesis has been challenged because tasks that control for associative learning do not show such cognitive control effects. The current study experimentally controlled associative learning by presenting a novel stimulus on every trial of a picture-speech conflict task and found that performance adjustments still occur. Thirty-one healthy young adults listened to and repeated words presented in background noise while viewing pictures that were congruent or incongruent (i.e., phonological neighbors) with the word. Following conflict, participants had higher word recognition (+17% points) on incongruent but not congruent trials. This result was not attributable to posterror effects nor a speed-accuracy trade-off. An analysis of erroneous responses showed that participants made more phonologically related errors than nonrelatcd errors only on incongruent trials, demonstrating elevated phonological conflict when the picture was a neighbor of the target word. Additionally, postconflict improvements appear to be due to better resolution of phonological conflict in the mental lexicon rather than decreased attention to the picture or increased attention to the speech signal. Our findings provide new evidence for conflict monitoring and suggest that cognitive control helps resolve phonological conflict during speech recognition
Background
Core lexicon (CL) analysis is a time efficient and possibly reliable measure that captures discourse production abilities. For people with aphasia, CL scores have demonstrated correlations ...with aphasia severity, as well as other discourse and linguistic measures. It was also found to be clinician‐friendly and clinically sensitive enough to capture longitudinal changes in aphasia. To our knowledge, CL has never been investigated in individuals with neurologically progressive disease.
Aims
As a preliminary investigation, we sought to investigate (1) whether CL scores correlate with dementia severity, (2) whether CL scores correlate with measures of discourse quality, and (3) whether CL scores correlate with other measures of lexical/semantic access.
Methods & Procedures
Twelve participants with a cognitive impairment associated with dementia of the Alzheimer's type (DAT) completed several measures of language and cognitive ability, as well as provide a language sample from the wordless picture book, Picnic.
Results & Conclusion
Results are informative, as they provide insight into characteristics of CL and provide support for potential use of CL in individuals with neurologically progressive disease. The results indicated that CL scores do correlate with dementia severity and several measures of language ability, indicating they may provide a useful measure of language abilities in DAT, but more research is needed.
WHAT THIS PAPER ADDS
What is already known on the subject
Core lexicon (CL) analysis is an assessment measure of discourse ability, most closely related to informativeness or productivity, used in aphasiology that is easier to use and less time consuming than previous measures of informativeness, such as correct information units or type‐token ratio (TTR). For people with aphasia, CL analysis correlates with aphasia severity, measures of informativeness, as well as other measures of discourse quality. It has also been shown to be faster and more reliable between scorers than other informativeness measures.
What this study adds
Core lexicon analysis is a new simple and online method for assessing the informativeness of a discourse sample without the need to record or transcribe the language sample. CL is receiving a lot of attention in aphasia, correlating with everything from aphasia severity to measures of productivity and lexical access, as well as measures of informativeness. Unfortunately, no one has investigated CL analysis in dementia. The study demonstrates the first evidence that CL analysis may be a useful measure for determining dementia severity and language quality in people with dementia.
What are the clinical implications of this work?
Core lexicon analysis may provide clinicians and researchers with an easy method for assessing the discourse of people with a cognitive impairment associated with dementia of the Alzheimer's type. This will improve initial assessment, as well as improve ongoing language assessment that may provide clues into their functional ability to communicate effectively.
Detecting sentiment of sentences in online reviews is still a challenging task. Traditional machine learning methods often use bag-of-words representations which cannot properly capture complex ...linguistic phenomena in sentiment analysis. Recently, recursive autoencoder (RAE) methods have been proposed for sentence-level sentiment analysis. They use word embedding to represent each word, and learn compositional vector representation of phrases and sentences with recursive autoencoders. Although RAE methods outperform other state-of-the-art sentiment prediction approaches on commonly used datasets, they tend to generate very deep parse trees, and need a large amount of labeled data for each node during the process of learning compositional vector representations. Furthermore, RAE methods mainly combine adjacent words in sequence with a greedy strategy, which make capturing semantic relations between distant words difficult. To solve these issues, we propose a semi-supervised method which combines HowNet lexicon to train phrase recursive autoencoders (we call it CHL-PRAE). CHL-PRAE constructs the phrase recursive autoencoder (PRAE) model at first. Then the model calculates the sentiment orientation of each node with the HowNet lexicon, which acts as sentiment labels, when we train the softmax classifier of PRAE. Furthermore, our CHL-PRAE model conducts bidirectional training to capture global information. Compared with RAE and some supervised methods such as support vector machine (SVM) and naïve Bayesian on English and Chinese datasets, the experiment results show that CHL-PRAE can provide the best performance for sentence-level sentiment analysis.
The examination of how words are learned can offer valuable insights into the nature of lexical representations. For example, a common assessment of novel word learning is based on its ability to ...interfere with other words; given that words are known to compete with each other (
Luce and Pisoni, 1998
;
Dahan et al., 2001
), we can use the capacity of a novel word to interfere with the activation of other lexical representations as a measure of the degree to which it is integrated into the mental lexicon (
Leach and Samuel, 2007
). This measure allows us to assess novel word learning in L1 or L2, but also the degree to which representations from the two lexica interact with each other (
Marian and Spivey, 2003
). Despite the somewhat independent lines of research on L1 and L2 word learning, common patterns emerge across the two literatures (
Lindsay and Gaskell, 2010
;
Palma and Titone, 2020
). In both cases, lexicalization appears to follow a similar trajectory. In L1, newly encoded words often fail at first to engage in competition with known words, but they do so later, after they have been better integrated into the mental lexicon (
Gaskell and Dumay, 2003
;
Dumay and Gaskell, 2012
;
Bakker et al., 2014
). Similarly, L2 words generally have a facilitatory effect, which can, however, become inhibitory in the case of more robust (high-frequency) lexical representations. Despite the similar pattern, L1 lexicalization is described in terms of inter-lexical connections (
Leach and Samuel, 2007
), leading to more automatic processing (
McMurray et al., 2016
); whereas in L2 word learning, lack of lexical inhibition is attributed to less robust (i.e., fuzzy) L2 lexical representations. Here, I point to these similarities and I use them to argue that a common mechanism may underlie similar patterns across the two literatures.
This paper provides an in-depth investigation of the possibility of systematically using flexemes – i.e., lexical units characterized in terms of form, as opposed to lexemes, characterized in terms ...of meaning – to model overabundance – i.e., the availability of more than one form in the same paradigm cell. The starting point is a preliminary evaluation of the advantages and disadvantages of using flexemes to account for different overabundance phenomena, showing that flexemes are a good way to capture the systematicity of overabundance, either across lexemes or across cells. Consequently, it is suggested that flexemes can be an interesting technical solution for the creation of a lexicon of Latin verbs that not only documents all the competing wordforms available as principal parts, but also captures the systematic relationship that sometimes holds between variants filling different cells. A principled method to identify such systematicity is then described in detail. It is argued that a constructive approach based on the identity of stems and/or inflection class is not fully adequate for the data at hand. Therefore, the proposed procedure adopts an abstractive, word-based perspective that only relies on alternation patterns between unsegmented wordforms. Practical and theoretical implications of the work are finally discussed, particularly regarding the usefulness of a formal approach to the identification of lexical units and paradigm cells.
Research on the relationship between contentious action and news production has often focused on the coverage and framing of specific events, not on the careers of keywords of the protest lexicon ...itself. However, these keywords play a central role in the negotiation of common understandings of social problems, the legitimation of claims and tactics, and even the shared imaginary of grassroots politics within communities of readers. This article seeks to contribute to this second avenue of media research by studying use of the concept activism and associated subject, activists. I ask how this word, which was a negative term for most of the twentieth century until the introduction and popularization of its modern sense in the 1960s, became a keyword of modern political participation by the public. A conceptual history grounded in insights of distributional semantics and semantic field theory, this article studies patterns of use of ‘activist’ and ‘activism’ in two major British quality newspapers, The Guardian and The Times. This comparative approach aims to identify both historical and media-internal factors that contributed to activism becoming a meaningful category in news reporting. Coverage is compared for three episodes of heightened civic contention: the student protests of 1967–1969; Eastern European human rights activism around the Helsinki Accords, 1975–1977; and the industrial strikes of the 1980s, particularly the period around the miner's strike, 1984–1986.
•Operationalizes the sociological concept “protest lexicon” to study newspaper communication.•Theorizes the role of keywords in the negotiation of mass media messaging and legitimation.•Provides a conceptual history and reputational career of the keyword “activism”.•Combines quantitative and qualitative discourse analysis methodology.
In past phonology literature, diacritics, brackets and other extra-phonological objects have been employed to identify morpheme boundaries and to differentiate words from affixes. In the present ...study, we argue that all of these extra-phonological items bring arbitrariness to phonological theory since phonology is only concerned with identifying phonological objects. In this respect, the present study proposes a new account in order to identify word boundaries in phonology and to explain phonological processes which show sensitivity to morphological boundaries without referring to any extra-phonological objects. Accordingly, we argue for a novel template model and propose that bases (e.g. words), productive suffixes and prefixes are listed in the lexicon with their own unique phonological templates, ONO, NO and ON, respectively. This means that their morphological categories are recognizable when they come to phonology. Also, the morphological boundness of prefixes and suffixes is visible in our template model: The absence of a final onset in prefixes (ON) and the absence of an initial onset in suffixes (NO) render them phonologically bound to a base because only bases exhibit an (O...O) structure. Accordingly, these morphemes come to phonology with their own templates, and phonological operations (government, licensing, etc.) apply to them when necessary. Also, we put forward two new parameters, the Final Onset Parameter and the Initial Onset Parameter, in order to explain the word-final/word-initial differences among languages. We argue that phonological processes, the phonology-morphology interface and their relation to the lexicon are non-arbitrarily explainable in our model thanks to the templates and novel parameters.
This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The resource components include: (i) a lexicon-grammar based ...dictionary of 2100 predicate nouns co-occurring with the support verb
ser de
‘be of’, such as in
ser de uma ajuda inestimável
‘be of invaluable help’; (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb
fazer
‘do’ or ‘make’, such as in
fazer uma comparação
‘make a comparison’; and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs
ser
and/or estar ‘be’, such as in
ser simpático
‘be kind’ or
estar entusiasmado
‘be enthusiastic’. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The paper highlights the different complementary and synergistic components and integration efforts, and presents some preliminary evaluation results on the inclusion of such resources in the eSPERTo paraphrase generation system.