Currently, the sentiment analysis research in the Malaysian context lacks in terms of the availability of the sentiment lexicon. Thus, this issue is addressed in this paper in order to enhance the ...accuracy of sentiment analysis. In this study, a new lexicon for sentiment analysis is constructed. A detailed review of existing approaches has been conducted, and a new bilingual sentiment lexicon known as MELex (Malay-English Lexicon) has been generated. Constructing MELex involves three activities: seed words selection, polarity assignment, and synonym expansions. Our approach differs from previous works in that MELex can analyze text for the two most widely used languages in Malaysia, Malay, and English, with the accuracy achieved, is 90%. It is evaluated based on the experimentation and case study approaches where the affordable housing projects in Malaysia are selected as case projects. This finding has given an implication on the ability of MELex to analyze public sentiments in the Malaysian context. The novel aspects of this paper are two-fold. Firstly, it introduces the new technique in assigning the polarity score, and second, it improves the performance over the classification of mixed language content.
Distributional semantics is a usage-based model of meaning, based on the assumption that the statistical distribution of linguistic items in context plays a key role in characterizing their semantic ...behavior. Distributional models build semantic representations by extracting co-occurrences from corpora and have become a mainstream research paradigm in computational linguistics. In this review, I present the state of the art in distributional semantics, focusing on its assets and limits as a model of meaning and as a method for semantic analysis.
In natural language, multiple meanings often share a single word form, a phenomenon known as colexification. Some sets of meanings are more frequently colexified across languages than others, but the ...source of this variation is not well understood. We propose that cross-linguistic variation in colexification frequency is non-arbitrary and reflects a general principle of cognitive economy: More commonly colexified meanings across languages are those that require less cognitive effort to relate. To evaluate our proposal, we examine patterns of colexification of varying frequency from about 250 languages. We predict these colexification data based on independent measures of conceptual relatedness drawn from large-scale psychological and linguistic resources. Our results show that meanings that are more frequently colexified across these languages tend to be more strongly associated by speakers of English, suggesting that conceptual associativity provides an important constraint on the development of the lexicon. Our work extends research on polysemy and the evolution of word meanings by grounding cross-linguistic regularities in colexification in basic principles of human cognition.
Twenty-seven representative Hunan fuzhuan brick teas were collected to develop a terminology lexicon and a quantitative descriptive analysis (QDA) method suitable for the sensory evaluation of Hunan ...fuzhuan brick tea infusion. Ten trained panelists developed a terminology lexicon comprised of eleven aroma and six taste attributes and evaluated the intensities of sensory attributes of each sample by conducting the QDA method. The QDA results showed that seventeen attributes listed in the final lexicon can be used to evaluate the quality of Hunan fuzhuan brick tea infusion properly, among which five aroma attributes, overall aroma, smoky, floral, fermented, and sweet (fruit), and one taste attribute, bitter, were the characteristic attributes to distinguish the differences in the sample qualities. Another panel made up of four professional cuppers evaluated samples by the cupping method to analyze the applicability and accuracy of the lexicon and the QDA method. The results showed that both the cupping method and QDA can be effectively used to evaluate Hunan fuzhuan brick tea quality, and their evaluation results showed high consistency and mutual complementation. This information will be beneficial for developing a sensory evaluation method and quality control for Hunan fuzhuan brick tea.
Display omitted
•An objective lexicon for the evaluation of Hunan fuzhuan brick tea was established.•Lexicon made of 11 aroma and 6 taste attributes were developed to evaluate Hunan fuzhuan brick tea.•5 aroma and 1 taste attributes were the characteristics for quality distinguish.•Cupping method and QDA showed a highly consistence and mutual complementation.
This research aims to document linguistic category regarding the field of agriculture, from the perspective of society in association with plants and their environment, the elaboration of local ...wisdom in viewing plants and their environment regarding agriculture as part of their livelihood. The ethnobiology lexicon data and other information in relation to plants in Javanese are compiled in articles, books, and dictionaries along with several native speakers of Javanese who reside in the southern part of DIY. Verbal source of data is obtained by controlled elicitation method and analyzed with component analysis method as well as introspection. The result demonstrates the inventorizing of agricultural terms which encompass words, abbreviation, coinage either monomorphemic or polymorphemic. The variety of agricultural term is used in the society in categorizing plants and their environment and highly regarded local wisdom perspective by the people amidst the emergence of renewal in agriculture. This research contributes to the study of language in agricultural activities in Javanese society, especially Yogyakarta in the perspective of ethnics, especially in the formation of language terms which at some level have become a new identity adopted into Javanese. It is hoped that the results of this study can be used by the wider community in using the agricultural lexicon in accordance with the times without forgetting the preservation of the agricultural lexicon that is part of its local wisdom.agriculture, local wisdom, lexicon.
Network analyses of the phonological mental lexicon show that words are clustered into communities and phonologically dissimilar words can be connected to each other through distant paths. Here we ...investigate whether behavioral traces of the large-scale structure of the phonological lexicon can be obtained. Participants listened to pairs of spoken words and made phonological similarity judgments for word pairs with varying path lengths and community membership. Path length in the phonological network represented the number of steps needed to traverse from one word to another word in the phonological network. Word pairs were either from the same phonological community or from different communities. Results indicated that participants were sensitive to large-scale structure of the phonological lexicon. Word pairs residing in the same community were more likely rated as similar sounding than word pairs from different communities. Word pairs with longer path lengths were less likely rated as similar sounding than word pairs with shorter path lengths. Computational simulations suggested that the behavioral findings could be accounted for via a spreading activation mechanism implemented on the phonological network. Taken together, our results provide converging evidence that people are sensitive to the large-scale structure of the phonological language network and have implications for our understanding of the nature of phonological similarity representations in the mental lexicon.
The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption ...stems from the fact that words considered to be “close” in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage. We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins.
•We demonstrate that semantic relationships can be effective for lexicon expansion.•We propose a novel method that explores distances between word embeddings.•Our unsupervised method enhances lexicon coverage while keeping high precision.•In our experiments, we beat all lexicon-expansion baselines by large margins.•Our approach is competitive with pre-trained transformers (BERT) without any training.
The use of marks to show the presence of colloquial lexicon in dictionaries is not without controversy. In recent decades, numerous studies have appeared that demonstrate the importance of unifying ...criteria to avoid confusion and ambiguities among users. While this may be relevant for any dictionary, it becomes imperative for didactic dictionaries that undoubtedly require greater systematization and clarity. In this research, the treatment of colloquialisms in two lexicographical works of this type, Diccionario del estudiante (DE) and Diccionario para la ensenanza de la lengua espanola (DIPELE), are contrasted and analysed to know the criteria that are used from the lexicographical and lexicological points of view.
Sentiment analysis of text data, such as reviews, can help users and merchants make more favorable decisions. It is difficult to use the popular supervised learning method to complete the sentiment ...classification task because marking data manually is time-consuming and laborious. Unsupervised sentiment classification methods are mostly based on sentiment lexicons. The existing sentiment lexicons are simply not capable of domain sentiment classification, it still requires to construct a domain sentiment lexicon. There are still many problems with the advanced domain sentiment lexicon construction methods, e.g., rely heavily on labeled data, poor accuracy. We propose a labeled data extension idea to reduce the dependence of supervised learning methods on labeled data. In order to solve the problems of domain sentiment lexicon construction, we proposed a novel framework based on multi-source information fusion (MSIF) for learning. We extracted four kinds of emotional information, which are lexicon emotional information, emotional word co-occurrence information, emotional word polarity information and polarity relationship information of emotional word pair. When extracting the co-occurrence information, a novel method based on the data extension idea is proposed to enhance its accuracy and coverage. In order to accelerate the solution of the fusion model, an optimization method based on the ADMM algorithm is applied. Experimental results on five Amazon product review datasets show that the sentiment dictionary constructed by the proposed method can significantly improve the performance of review sentiment classification compared with the current popular baseline and the state-of-the-art methods.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK