BackgroundThe UK MS Register (UKMSR) collects multiple sclerosis (MS) clinical data from UK NHS specialist treatment centres. Relevant clinical data for research is difficult to access and time ...consuming to obtain- clinical letters remain the canonical record for patient interaction. We obtained outpatient letters from 11 NHS sites and used natural language processing (NLP) techniques to harvest data.AimCapture minimally relevant clinical dataset from outpatient letters.MethodWe defined and implemented a ruleset using General Architecture for Text Engineering (GATE). Seeking forename, surname, DOB, gender, NHSNumber, clinic-date, postcode, MSType and EDSS. In a validation task, 100 randomly selected letters were manually reviewed by domain experts and results were compared with those of the algorithm for accuracy.Results2436 letters from 771 individuals were analysed. F-measures were: forename (99.0%), surname (99.0%), DOB (100.0%), gender (94.7%), NHSNumber (100.0%), clinic-date (94.4%), postcode (99.5%), MSType (95.7%), EDSS (97.4). Overall, 115 new EDSS scores were obtained.ConclusionNLP can be used to accurately extract information from free text in outpatient letters. This process can be built into a routine ingestion pipeline so that information can regularly be added to routinely collected clinical data, enhancing the data used for research at the UKMSR.
Display omitted
•We survey methods of representing clinical text using neural networks.•We provide a “how-to” guide for training these representations on clinical text.•We describe word models, ...corpora, evaluation methods, and applications.
Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.
Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the ...field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.
Figurative language generation (FLG) is the task of reformulating a given text to include a desired figure of speech, such as a hyperbole, a simile, and several others, while still being faithful to ...the original context. This is a fundamental, yet challenging task in Natural Language Processing (NLP), which has recently received increased attention due to the promising performance brought by pre-trained language models. Our survey provides a systematic overview of the development of FLG, mostly in English, starting with the description of some common figures of speech, their corresponding generation tasks, and datasets. We then focus on various modelling approaches and assessment strategies, leading us to discussing some challenges in this field, and suggesting some potential directions for future research. To the best of our knowledge, this is the first survey that summarizes the progress of FLG including the most recent development in NLP. We also organize corresponding resources, e.g., article lists and datasets, and make them accessible in an open repository. We hope this survey can help researchers in NLP and related fields to easily track the academic frontier, providing them with a landscape and a roadmap of this area.
Attention in Natural Language Processing Galassi, Andrea; Lippi, Marco; Torroni, Paolo
IEEE transaction on neural networks and learning systems,
10/2021, Letnik:
32, Številka:
10
Journal Article
Odprti dostop
Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced ...advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain.
Explainable Natural Language Processing by Anders Søgaard ( University of Copenhagen). Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 51), 2021, ...xvi + 107 pp; paperback, ISBN: 9781636392134; ebook, ISBN: 9781636392141; hardcover, ISBN: 9781636392158 DOI:10.2200/S01118ED1V01Y202107HLT051.
Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. ...Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.