This study focuses on refining temporal relation extraction within medical documents by introducing an innovative bimodal architecture. The overarching goal is to enhance our understanding of ...narrative processes in the medical domain, particularly through the analysis of extensive reports and notes concerning patient experiences.
Our approach involves the development of a bimodal architecture that seamlessly integrates information from both text documents and knowledge graphs. This integration serves to infuse common knowledge about events into the temporal relation extraction process. Rigorous testing was conducted on diverse clinical datasets, emulating real-world scenarios where the extraction of temporal relationships is paramount.
The performance of our proposed bimodal architecture was thoroughly evaluated across multiple clinical datasets. Comparative analyses demonstrated its superiority over existing methods reliant solely on textual information for temporal relation extraction. Notably, the model showcased its effectiveness even in scenarios where not provided with additional information.
The amalgamation of textual data and knowledge graph information in our bimodal architecture signifies a notable advancement in the field of temporal relation extraction. This approach addresses the critical need for a more profound understanding of narrative processes in medical contexts.
In conclusion, our study introduces a pioneering bimodal architecture that harnesses the synergy of text and knowledge graph data, exhibiting superior performance in temporal relation extraction from medical documents. This advancement holds significant promise for improving the comprehension of patients' healthcare journeys and enhancing the overall effectiveness of extracting temporal relationships in complex medical narratives.
Odkrivanje koreferenčnosti je ena izmed treh ključnih nalog ekstrakcije informacij iz besedil, kamor spadata še prepoznavanje imenskih entitet in ekstrakcija povezav. Namen odkrivanja koreferenčnosti ...je prek celotnega besedila ustrezno združiti vse omenitve entitet v skupine, v katerih vsaka skupina predstavlja svojo entiteto. Metode za reševanje te naloge se za nekatere jezike z več govorci razvijajo že dalj časa, medtem ko za slovenski jezik še niso bile izdelane. V prispevku predstavljamo nov, ročno označen korpus za odkrivanje koreferenčnosti v slovenskem jeziku – korpus coref149. Za avtomatsko odkrivanje koreferenčnosti smo prilagodili sistem SkipCor, ki smo ga izdelali za angleški jezik. Sistem SkipCor je na slovenskem gradivu dosegel 76 % ocene CoNLL 2012. Ob tem smo analizirali še vplive posameznih tipov značilk in preverili, katere so pogoste napake. Pri analiziranju besedil smo razvili tudi programsko knjižnico s spletnim vmesnikom, prek katere je možno izvesti vse opisane analize in neposredno primerjati njihovo uspešnost. Rezultati analiz so obetavni in primerljivi z rezultati pri drugih, bolj razširjenih jezikih. S tem smo dokazali, da je avtomatsko odkrivanje koreferenčnosti v slovenskem jeziku lahko uspešno, v prihodnosti pa bi bilo potrebno izdelati še večji in kvalitetnejši korpus, v katerem bodo koreferenčno naslovljene vse posebnosti slovenskega jezika, kar bi omogočilo izgradnjo učinkovitih metod za avtomatsko reševanje koreferenčnih problemov.
Textual documents serve as representations of discussions on a variety of subjects. These discussions can vary in length and may encompass a range of events or factual information. Present trends in ...constructing knowledge bases primarily emphasize fact-based common sense reasoning, often overlooking the temporal dimension of events. Given the widespread presence of time-related information, addressing this temporal aspect could potentially enhance the quality of common-sense reasoning within existing knowledge graphs. In this comprehensive survey, we aim to identify and evaluate the key tasks involved in constructing temporal knowledge graphs centered around events. These tasks can be categorized into three main components: (a) event extraction, (b) the extraction of temporal relationships and attributes, and (c) the creation of event-based knowledge graphs and timelines. Our systematic review focuses on the examination of available datasets and language technologies for addressing these tasks. An in-depth comparison of various approaches reveals that the most promising results are achieved by employing state-of-the-art models leveraging large pre-trained language models. Despite the existence of multiple datasets, a noticeable gap exists in the availability of annotated data that could facilitate the development of comprehensive end-to-end models. Drawing insights from our findings, we engage in a discussion and propose four future directions for research in this domain. These directions encompass (a) the integration of pre-existing knowledge, (b) the development of end-to-end systems for constructing event-centric knowledge graphs, (c) the enhancement of knowledge graphs with event-centric information, and (d) the prediction of absolute temporal attributes.
Poročilo o 34. evropski poletni šoli logike, jezika in informatike (European Summer School of Logic, Language and Information (ESSLLI)), ki je potekala med 31. julijem in 11. avgustom 2023 na ...Fakulteti za računalništvo in informatiko v Ljubljani.
The rapid growth of social media, news sites, and blogs increases the opportunity to express and share an opinion on the Internet. Researchers from different fields take advantage of nearly limitless ...data. Thus, in the past decade, opinion mining or sentiment analysis has become an important research discipline. In this paper, we focus on the target-level sentiment analysis, wherein the task is to predict the sentiment concerning specific (multiple) entities that appear as coreference mentions throughout the document. We created a new annotated dataset of Slovene news articles, additionally annotated with named entities and coreferences that are the basis for the proposed task. Using entity-document representation, we compared the task with the traditional sentiment analysis, evaluating traditional machine learning and deep neural network approaches. According to existing approaches, the proposed task represents a challenging problem. The results show that we can achieve the best results using a customised BERT adapter (a minor improvement over a standard text-classification adapter). We outperformed existing aspect-based state-of-the-art approaches by 13%, reaching up to 77% accuracy and a 73% F1 score.
We investigated how Natural Language Processing (NLP) algorithms could automatically grade answers to open-ended inference questions in web-based eBooks. This is a component of research on making ...reading more motivating to children and to increasing their comprehension. We obtained and graded a set of answers to open-ended questions embedded in a fiction novel written in English. Computer science students used a subset of the graded answers to develop algorithms designed to grade new answers to the questions. The algorithms utilized the story text, existing graded answers for a given question and publicly accessible databases in grading new responses. A computer science professor used another subset of the graded answers to evaluate the students’ NLP algorithms and to select the best algorithm. The results showed that the best algorithm correctly graded approximately 85% of the real-world answers as correct, partly correct, or wrong. The best NLP algorithm was trained with questions and graded answers from a series of new text narratives in another language, Slovenian. The resulting NLP algorithm model was successfully used in fourth-grade language arts classes for providing feedback to student answers on open-ended questions in eBooks.
In the recent, and ongoing, Covid-19 pandemic, remote or online K-12 schooling became the norm. Even if the pandemic tails off somewhat, remote K-12 schooling will likely remain more frequent than it ...was before the pandemic. A mainstay technique of online learning, at least at the college and graduate level, has been the online discussion. Since it does afford the potential for meaningful learner-learner and instructor-learner interaction, which are vital for distance learning, it is worth considering online discussions for K-12 remote schooling. One challenge with online learning in general, and online discussion in particular, is that it is labor intensive for teachers to moderate. Effective moderating of online discussions is vital for discussions to be nurturing, effective learning situations. Yet, moderating of online discussions is notoriously labor-intensive for teachers/instructors. Further, since younger learners are more likely to drift off topic, in general, but particularly in small group online discussions, automated early warning systems are helpful. The current study collected small group, "book club", discussion data from fourth graders reading web-based eBooks in Slovenian primary schools, qualitatively coded the data and analyzed postings using computer-based natural language processing to predict when students went off-topic. One indicator that postings are on-topic is book relevance, i.e. that the posting is relevant to eBook content. The computer algorithm correctly predicted book relevance of postings 90 percent of the time, suggesting that automated computer algorithms could assist teachers with moderating online discussions, providing real-time notifications of problems in online discussions. Further, this study provided a proof-of-concept that small group online discussions, in web-based eBooks can be practical and educationally meaningful in fourth grade classes.
Knowledge graphs are commonly represented by ontology-based databases. Tracking the provenance of ontological changes and ensuring ontology consistency is important. In this work, we propose a ...transaction manager for ontology-based database manipulation that combines blockchain and Semantic Web technologies. The latter is used for the efficient querying and modification of data, whereas the blockchain is used for the secure storage and tracking of changes. The blockchain enables a decentralized setup and data restoration. We evaluate our solution by measuring cost and time. Our solution introduces some overhead for updates whereas querying works at the same speed as the underlying ontology database.
Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous ...amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks.
We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions.
Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.
Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the ...three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.