Abstract
Gene Ontology (GO) is widely used in the biological domain. It is the most comprehensive ontology providing formal representation of gene functions (GO concepts) and relations between them. ...However, unintentional quality defects (e.g. missing or erroneous relations) in GO may exist due to the large size of GO concepts and complexity of GO structures. Such quality defects would impact the results of GO-based analyses and applications. In this work, we introduce a novel evidence-based lexical pattern approach for quality assurance of GO relations. We leverage two layers of evidence to suggest potentially missing relations in GO as follows. We first utilize related concept pairs (i.e. existing relations) in GO to extract relationship-specific lexical patterns, which serve as the first layer evidence to automatically suggest potentially missing relations between unrelated concept pairs. For each suggested missing relation, we further identify two other existing relations as the second layer of evidence that resemble the difference between the missing relation and the existing relation based on which the missing relation is suggested. Applied to the 15 December 2021 release of GO, this approach suggested a total of 866 potentially missing relations. Local domain experts evaluated the entire set of potentially missing relations, and identified 821 as missing relations and 45 indicate erroneous existing relations. We submitted these findings to the GO consortium for further validation and received encouraging feedback. These indicate that our evidence-based approach can be utilized to uncover missing relations and erroneous existing relations in GO.
Detecting adverse drug reactions (ADRs) is an important task that has direct implications for the use of that drug. If we can detect previously unknown ADRs as quickly as possible, then this ...information can be provided to the regulators, pharmaceutical companies, and health care organizations, thereby potentially reducing drug-related morbidity and saving lives of many patients. A promising approach for detecting ADRs is to use social media platforms such as Twitter and Facebook. A high level of correlation between a drug name and an event may be an indication of a potential adverse reaction associated with that drug. Although numerous association measures have been proposed by the signal detection community for identifying ADRs, these measures are limited in that they detect correlations but often ignore causality.
This study aimed to propose a causality measure that can detect an adverse reaction that is caused by a drug rather than merely being a correlated signal.
To the best of our knowledge, this was the first causality-sensitive approach for detecting ADRs from social media. Specifically, the relationship between a drug and an event was represented using a set of automatically extracted lexical patterns. We then learned the weights for the extracted lexical patterns that indicate their reliability for expressing an adverse reaction of a given drug.
Our proposed method obtains an ADR detection accuracy of 74% on a large-scale manually annotated dataset of tweets, covering a standard set of drugs and adverse reactions.
By using lexical patterns, we can accurately detect the causality between drugs and adverse reaction-related events.
Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building ...KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.
En este trabajo mostramos que en lengua yaqui se distinguen cuatro patrones lexicales: i) movimiento y trayectoria; ii) movimiento y manera; iii) movimiento, trayectoria y causa; y iv) movimiento, ...trayectoria y figura. Asimismo, determinamos que la ocurrencia de un patrón lexical tiene implicaciones en la estructura sintáctica del evento de movimiento, pues los eventos con patrones lexicales i), ii) y iv) tienen una estructura sintáctica intransitiva, donde la figura corresponde con el sujeto de la oración. Mientras que los eventos con patrón lexical iii) tienen una estructura sintáctica transitiva, donde la figura corresponde con el objeto directo.
Activities and questions to assess knowledge in English for medical purposes were designed to incorporate terminology, academic vocabulary and grammar items in computer-based tests for bachelor ...students in Dentistry at Medical University - Varna. During the two-semester study course identification of key words, core lexical patterns, specific collocations and emphasis on their recurrent use were the selected strategies for student retention of specialized language and improved learning outcomes. Medical terms as single-word lexical units with straightforward definitions are easy to learn compared to collocations and multi-word terms including polysemous adjectives. In the present paper we concentrate on the process of formation of structures containing attributive adjectives which can be quite a problematic and error-generating area for second language learners. The aim of the on-going research study is to extract current lists of the sub-technical vocabulary and the terminological units in specialized medical domains as linguistic resources. The collection of assessment materials into a test bank for specific educational purposes is a customizable electronic resource, imported into the University platform to facilitate the process of compilation and creation of new tests.
By implementing corpus linguistic tools into test design, the instructor aims at providing an authentic e-assessment environment based on the idea of key words in context, concordances, and lexical patterns as per the contents of the selected textbooks and teaching materials during the course. The paper highlights some strategic issues about creating test resources in EMP such as the adherence to a set of selected linguistic items and grammatical structures based on their frequency in the domain.
This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to ...extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60–70% for the best combinations proposed.
E-commerce organizations are growing exponentially with time in terms of both business and data. Many organizations rely on these websites to attract new customers and retain the existing ones. In ...order to achieve this goal web log files can be used that records customer's access patterns. Using traditional web usage mining techniques in an enhanced manner valuable patterns and hidden knowledge can be discovered. This paper focuses on providing real time dynamic recommendation to all the visitors of the website irrespective of been registered or unregistered. Action based rational recommendation technique is proposed that makes use of lexical patterns to generate item recommendation. Effectiveness of the proposed system is evaluated by collecting real time E commerce data and comparing the system with user based and product based techniques. Results prove that the proposed system yield good quality accuracy and minimizes limitations of traditional recommendation system.
This article will focus on a corpus of fifty-five British, Irish, American and Australian press articles that have been published online. The aim will be to wonder how emotion is dealt with, ...especially in headlines, as compared to the first lines of the article itself and the brief summary that can be found on the Internet: what lexical tools do journalists generally use in each of those utterances and do they differ as time goes by? The research will hence be carried out both from a synchronic and a diachronic point of view so as to see whether around the same topic there are any differences to notice between newspapers depending on the political stance they reflect, the kind of readership they have or the overall context. Some focus will be put on the cases in which emotion results from a sensation that gives birth to a perception so as to analyze how it is expressed while pondering over the new idioms that may occur around this triad. The links between sensitivity and corporeality will thus be scrutinized to see if there are any recurring lexical patterns such as linguistic metaphors, and any cultural variants, whose origins will then have to be established. Within the framework delineated by the conceptual metaphor theory, the purpose will also be to trace the limit between emotion and emotionalism or sensationalism inside a rhetoric sometimes based on excess whenever the emotion expressed in the article is supposed to be aroused in readers, whether it be empathy or rejection. To do so, the emphasis will be put on negative topics such as hurricanes and typhoons which involve a collective emotion that might differ from an individual one and whose impact depends on the registers of speech and the styles used by journalists as they deal with natural phenomena that are both unique and recurring events.
This study extends research into the use of English as a lingua franca in the European context by investigating the most frequent word combinations in English documents issued by EU institutions. As ...there is little research on the use of the English language within the European Union for ESP pedagogic purposes, as part of a larger scale analysis, the aim of this study is to explore the structures and functions of lexical bundles in English EU texts, and to draw conclusions regarding their relevance for language courses on English for EU purposes. Findings suggest that the structural and functional classification of EU lexical bundles show similarities with the language of university textbooks and academic prose in general. However, written English EU discourse applies lexical bundles in higher frequencies, which suggests that a fairly large proportion of EU texts are made up of formulaic patterns. The pedagogical implications of this study highlight the importance of explicit instruction in this type of word combination in courses on English for EU purposes.