A new methodology based on language models retrieves product features and opinions from a collection of free-text customer reviews about a product or service. The proposal relies on a ...language-modeling framework that can be applied to reviews in any domain and language provided with a minimal knowledge source of sentiments or opinions (that is, a minimal seed set of opinion words).
In this paper, we introduce a new methodology for modeling product aspects from a collection of free-text customer reviews. The proposal relies on a language modeling framework and is domain ...independent. It combines both a kernel-based model of opinion words and a stochastic translation model between words to approach the aspect model of products. We also present a ranking-based methodology to model the sentiments expressed about the aspects. The experiments carried out over several collections of customer reviews show encouraging results in the modeling of product aspects and their sentiments even from individual customer reviews.
Clustering methods have been extensively used in many Information Processing tasks in order to capture unknown object categories. However, clustering has been scarcely used as a sense labeling method ...for Word Sense Disambiguation (WSD), that is, as a way to identify groups of semantically related word senses that can be successfully used in a disambiguation process. In this paper, we present an unsupervised disambiguation method relying on word sense clustering that also reveals the implicit relationships (not asserted in WordNet) existing among these word senses.We also investigate in depth the role of clustering and its contribution to WSD. Experimental results demonstrate the usefulness of clustering for unsupervised WSD. Keywords: Word Sense Disambiguation, Clustering
Ontologies are frequently used in information retrieval being their main applications the expansion of queries, semantic indexing of documents and the organization of search results. Ontologies ...provide lexical items, allow conceptual normalization and provide different types of relations. However, the optimization of an ontology to perform information retrieval tasks is still unclear. In this paper, we use an ontology query model to analyze the usefulness of ontologies in effectively performing document searches. Moreover, we propose an algorithm to refine ontologies for information retrieval tasks with preliminary positive results.
In this paper, we present a framework for obtaining structurally complex condensed representations of documents sets, which will be used as a base for summarization, answering complex questions, etc. ...This framework includes a method for extracting a ranked list of facts, triples of the form entity - relation - entity, which relies on dependency parsing-based extraction patterns and language modeling; and methods for constructing a bipartite graph encoding the information contained in the set of facts and determining an appropriate traversing order on that structure. We evaluate the components of our framework on a subcollection extracted from MEDLINE, obtaining promising results.
Topic discovery based on text mining techniques Pons-Porrata, Aurora; Berlanga-Llavori, Rafael; Ruiz-Shulcloper, José
Information processing & management,
05/2007, Volume:
43, Issue:
3
Journal Article
Peer reviewed
In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic ...contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
This paper is intended to explore how to use terminological resources for ontology engineering. Nowadays there are several biomedical ontologies describing overlapping domains, but there is not a ...clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminologies may support the ontological development in several stages of the lifecycle of the ontology; e.g. ontology integration. In this paper we investigate the use of terminological resources during the ontology lifecycle. We claim that the proper creation and use of a shared thesaurus is a cornerstone for the successful application of the Semantic Web technology within life sciences. Moreover, we have applied our approach to a real scenario, the Health-e-Child (HeC) project, and we have evaluated the impact of filtering and re-organizing several resources. As a result, we have created a reference thesaurus for this project, named HeCTh.
In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated ...from the collection and the estimation of the topic homogeneity associated to these pairs. Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics. Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach.