The Romanian WordNet (RoWN) database is an impressive collection of Romanian nouns, verbs, adjectives, and adverbs, which can be seen as a network of nodes where, words can be found in a certain ...context to have the same meaning, and in the mesh of that network are found the other words that in turn can become a network node for another word in a specific context. In this study we propose an approach for the problem of aligning the word senses form RoWordNet with those of the Thesaurus Dictionary of the Romanian Language in electronic format (eDTLR), by exploiting the collections of definitions and examples in the two linguistic thesauri, based on a statistical model, where four heuristics are proposed for solving this issue.
Various applications in computational linguistics and artificial intelligence rely on high-performing word sense disambiguation techniques to solve challenging tasks such as information retrieval, ...machine translation, question answering, and document clustering. While text comprehension is intuitive for humans, machines face tremendous challenges in processing and interpreting a human’s natural language. This paper presents a novel knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix Multiplication (SCSMM). The SCSMM algorithm combines semantic similarity, heuristic knowledge, and document context to respectively exploit the merits of local sense-based context between consecutive terms, human knowledge about terms, and a document’s main topic in disambiguating terms. Unlike other algorithms, the SCSMM algorithm guarantees the capture of the maximum sentence context while maintaining the terms’ order within the sentence. The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets, while demonstrating comparable results to current state-of-the-art word sense disambiguation systems when dealing with each dataset separately. Furthermore, the paper discusses the impact of granularity level, ambiguity rate, sentence size, and part of speech distribution on the performance of the proposed algorithm.
•Semantic similarity affects the overall performance of knowledge-based Word Sense Disambiguation (WSD) systems.•With Semantic similarity, sense heuristics, and document context, we designed a novel knowledge-based word sense disambiguation algorithm.•The Sequential Contextual Similarity Matrix Multiplication (SCSMM) algorithm captures the maximum sentence context while maintaining the words’ order.•The SCSMM algorithm outperforms current WSD systems when disambiguating nouns.
Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of ...methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as supplementary material to allow the exact replication of our results.
Display omitted
•A large reproducible survey of ontology-based similarity measures and word embeddings.•Embeddings using ontologies get the best overall results on word similarity and relatedness.•Best performing WordNet-based similarity measures use IC models & path-based features.•Linear combinations of best-performing word embeddings improve the state of the art.
Internet users perceive a multilingual web but are unfamiliar with it due to communication in their regional language called Cross-Lingual Information Retrieval (CLIR). In CLIR, a translation ...technique is used to translate the user queries into the target document’s language. Conventional translation techniques are based on either a manual dictionary or a parallel corpus, whereas the trending Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) techniques are trained on a parallel corpus. NMT is not so mature for Hindi-English translation, according to the literature, and SMT performs better than the NMT. SMT provides a static translation due to the limited vocabularies in the available parallel corpus. It may not provide the translations for missing or unseen words, whereas the web provides a dynamic interface where multiple users are updating information at the same time. The web may provide the translations for missing or unseen words, and therefore the web is effectively used for technically developed languages like English, German, Spanish, Russian, and Chinese. In this article, different web resources such as Wikipedia, Hindi WordNet and Indo WordNet, ConceptNet, and online dictionary based translation techniques are proposed and applied to Hindi-English CLIR. Wikipedia-based translation approach incorporates three modules—exactly matched, partially matched, and disambiguation—to address the issues of wrong inter-wiki links, partially matched terms, and ambiguous articles. Hindi WordNet and Indo WorNet attribute “English synset” and ConceptNet attributes “Related term” & “Synonymy” are used for obtaining translations. Further, WordNet path similarity is used to disambiguate translations. Various online dictionaries are available that return multiple relevant and irrelevant translations. The proposed approaches are compared to the SMT where the Wikipedia-based approach achieves approximately similar mean average precision to SMT.
X-Similarity Comparison by using Wordnet Kasim, Shahreen; Omar, Nurul Aswa; Mohammad Akbar, Nurul Suhaida ...
JOIV : international journal on informatics visualization Online,
11/2017, Volume:
1, Issue:
4-2
Journal Article
Peer reviewed
Open access
Semantic web is an addition of the previous one that represents information more significantly for humans and computers. It enables the description of contents and services in machine readable form. ...It also enables annotating, discovering, publishing, advertising and composing services to be programmed. Semantic web was developed based on Ontology which is measured as the backbone of the semantic web. Machine-readable is transformed to machine-understandable in the current web. Moreover, Ontology provides a common vocabulary, a grammar for publishing data and can provide a semantic description of data which can be used to conserve the Ontology and keep them ready for implication. There are many that used in feature based in semantic similarity. This research presents a single ontology of X-Similarity feature based method.
•Semantic networks could be used to quantify convergence and divergence in design thinking.•Successful ideas exhibit divergence of semantic similarity and increased information content in ...time.•Client feedback enhances information content and divergence of successful ideas.•Information content and semantic similarity could be monitored for enhancement of user creativity.
Human creativity generates novel ideas to solve real-world problems. This thereby grants us the power to transform the surrounding world and extend our human attributes beyond what is currently possible. Creative ideas are not just new and unexpected, but are also successful in providing solutions that are useful, efficient and valuable. Thus, creativity optimizes the use of available resources and increases wealth. The origin of human creativity, however, is poorly understood, and semantic measures that could predict the success of generated ideas are currently unknown. Here, we analyze a dataset of design problem-solving conversations in real-world settings by using 49 semantic measures based on WordNet 3.1 and demonstrate that a divergence of semantic similarity, an increased information content, and a decreased polysemy predict the success of generated ideas. The first feedback from clients also enhances information content and leads to a divergence of successful ideas in creative problem solving. These results advance cognitive science by identifying real-world processes in human problem solving that are relevant to the success of produced solutions and provide tools for real-time monitoring of problem solving, student training and skill acquisition. A selected subset of information content (IC Sánchez–Batet) and semantic similarity (Lin/Sánchez–Batet) measures, which are both statistically powerful and computationally fast, could support the development of technologies for computer-assisted enhancements of human creativity or for the implementation of creativity in machines endowed with general artificial intelligence.
•Proposing a novel approach to combine textual and graphical embeddings methods.•Directly and independently using wordnet structure to train a synset embedding.•Analyzing efficiency of the proposed ...synset embedding in word similarity task.•Discussing the effect of different parameters on the performance of the model.•Proposing a weighting strategy to value various wordnet relations.
Due to the advances made in recent years, embedding methods caused a significant increase in the accuracy of text or graph processing methods. Embedding methods exhibit a compact vector representation of the basic elements (words, synsets, nodes,..) of the underlying system to encode the semantic information between the elements. Of course, due to the polysemous nature of words, in some NLP tasks, the use of sense/synset embedding is better than word embedding. However, in the literature, the introduction of embedding for synsets has received less attention. Existing synset embedding methods have complex calculations to calculate synset embedding based on word embeddings or base upon a defined pairwise synset similarity. In this paper, considering the graphical structure of the WordNet and the high-level knowledge encoded in it, we will create a synset embedding directly from the WordNet graph and its synset relations. Node2Vec graph embedding is used to map nodes of this graph to a vector space. We evaluate the performance of different graph structures (e.g. weighted/weightless, directed/undirected graphs). Moreover, we propose a weighting strategy to weight different synset relation types in the resulting WordNet graph. Experimental results of evaluation of the proposed synset embedding on the task of measuring lexical semantic similarities shows that mean squared error of similarities for the proposed synset embedding method on MEM and WordSim353 datasets are 0.065 and 0.035, resp., which is better than the mean squared error of Word2Vec on these datasets, (0.073 and 0.045, resp.). Furthermore, we use the Pearson correlation and Spearman correlation to compare the performance of the proposed synset embedding method with the state-of-the-art ones. The obtained results show the efficiency of the proposed method on various datasets. .The spearman correlation of the SimLex999 is improved by 0.02, while it improves WordSim353 Pearson correlation by 0.14.