This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of ...thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
ObjectiveThe objective of this study is to determine demographic and diagnostic distributions of physical pain recorded in clinical notes of a mental health electronic health records database by ...using natural language processing and examine the overlap in recorded physical pain between primary and secondary care.Design, setting and participantsThe data were extracted from an anonymised version of the electronic health records of a large secondary mental healthcare provider serving a catchment of 1.3 million residents in south London. These included patients under active referral, aged 18+ at the index date of 1 July 2018 and having at least one clinical document (≥30 characters) between 1 July 2017 and 1 July 2019. This cohort was compared with linked primary care records from one of the four local government areas.OutcomeThe primary outcome of interest was the presence of recorded physical pain within the clinical notes of the patients, not including psychological or metaphorical pain.ResultsA total of 27 211 patients were retrieved. Of these, 52% (14,202) had narrative text containing relevant mentions of physical pain. Older patients (OR 1.17, 95% CI 1.15 to 1.19), females (OR 1.42, 95% CI 1.35 to 1.49), Asians (OR 1.30, 95% CI 1.16 to 1.45) or black (OR 1.49, 95% CI 1.40 to 1.59) ethnicities, living in deprived neighbourhoods (OR 1.64, 95% CI 1.55 to 1.73) showed higher odds of recorded pain. Patients with severe mental illnesses were found to be less likely to report pain (OR 0.43, 95% CI 0.41 to 0.46, p<0.001). 17% of the cohort from secondary care also had records from primary care.ConclusionThe findings of this study show sociodemographic and diagnostic differences in recorded pain. Specifically, lower documentation across certain groups indicates the need for better screening protocols and training on recognising varied pain presentations. Additionally, targeting improved detection of pain for minority and disadvantaged groups by care providers can promote health equity.
Introduction Social media platforms such as Twitter and Weibo facilitate both positive and negative communication, including cyberbullying. Empirical evidence has revealed that cyberbullying ...increases when public crises occur, that such behavior is gendered, and that social media user account verification may deter it. However, the association of gender and verification status with cyberbullying is underexplored. This study aims to address this gap by examining how Weibo users’ gender, verification status, and expression of affect and anger in posts influence cyberbullying attitudes. Specifically, it investigates how these factors differ between posts pro- and anti-cyberbullying of COVID-19 cases during the pandemic. Methods This study utilized social role theory, the Barlett and Gentile Cyberbullying Model, and general strain theory as theoretical frameworks. We applied text classification techniques to identify pro-cyberbullying and anti-cyberbullying posts on Weibo. Subsequently, we used a standardized mean difference method to compare the emotional content of these posts. Our analysis focused on the prevalence of affective and anger-related expressions, particularly examining variations across gender and verification status of the users. Results Our text classification identified distinct pro-cyberbullying and anti-cyberbullying posts. The standardized mean difference analysis revealed that pro-cyberbullying posts contained significantly more emotional content compared to anti-cyberbullying posts. Further, within the pro-cyberbullying category, posts by verified female users exhibited a higher frequency of anger-related words than those by other users. Discussion The findings from this study can enhance researchers’ algorithms for identifying cyberbullying attitudes, refine the characterization of cyberbullying behavior using real-world social media data through the integration of the mentioned theories, and help government bodies improve their cyberbullying monitoring especially in the context of public health crises.
ObjectivesWe sought to use natural language processing to develop a suite of language models to capture key symptoms of severe mental illness (SMI) from clinical text, to facilitate the secondary use ...of mental healthcare data in research.DesignDevelopment and validation of information extraction applications for ascertaining symptoms of SMI in routine mental health records using the Clinical Record Interactive Search (CRIS) data resource; description of their distribution in a corpus of discharge summaries.SettingElectronic records from a large mental healthcare provider serving a geographic catchment of 1.2 million residents in four boroughs of south London, UK.ParticipantsThe distribution of derived symptoms was described in 23 128 discharge summaries from 7962 patients who had received an SMI diagnosis, and 13 496 discharge summaries from 7575 patients who had received a non-SMI diagnosis.Outcome measuresFifty SMI symptoms were identified by a team of psychiatrists for extraction based on salience and linguistic consistency in records, broadly categorised under positive, negative, disorganisation, manic and catatonic subgroups. Text models for each symptom were generated using the TextHunter tool and the CRIS database.ResultsWe extracted data for 46 symptoms with a median F1 score of 0.88. Four symptom models performed poorly and were excluded. From the corpus of discharge summaries, it was possible to extract symptomatology in 87% of patients with SMI and 60% of patients with non-SMI diagnosis.ConclusionsThis work demonstrates the possibility of automatically extracting a broad range of SMI symptoms from English text discharge summaries for patients with an SMI diagnosis. Descriptive data also indicated that most symptoms cut across diagnoses, rather than being restricted to particular groups.
Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope ...and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present.
To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall.
We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
ObjectiveThis paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from ...a large mental healthcare provider.DesignA multidisciplinary team iteratively developed guidelines for annotating clinical text referring to violence. Keywords were used to generate a dataset which was annotated (ie, classified as affirmed, negated or irrelevant) for: presence of violence, patient status (ie, as perpetrator, witness and/or victim of violence) and violence type (domestic, physical and/or sexual). An NLP approach using a pretrained transformer model, BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) was fine-tuned on the annotated dataset and evaluated using 10-fold cross-validation.SettingWe used the Clinical Records Interactive Search (CRIS) database, comprising over 500 000 de-identified EHRs of patients within the South London and Maudsley NHS Foundation Trust, a specialist mental healthcare provider serving an urban catchment area.ParticipantsSearches of CRIS were carried out based on 17 predefined keywords. Randomly selected text fragments were taken from the results for each keyword, amounting to 3771 text fragments from the records of 2832 patients.Outcome measuresWe estimated precision, recall and F1 score for each NLP model. We examined sociodemographic and clinical variables in patients giving rise to the text data, and frequencies for each annotated violence characteristic.ResultsBinary classification models were developed for six labels (violence presence, perpetrator, victim, domestic, physical and sexual). Among annotations affirmed for the presence of any violence, 78% (1724) referred to physical violence, 61% (1350) referred to patients as perpetrator and 33% (731) to domestic violence. NLP models’ precision ranged from 89% (perpetrator) to 98% (sexual); recall ranged from 89% (victim, perpetrator) to 97% (sexual).ConclusionsState of the art NLP models can extract and classify clinical text on violence from EHRs at acceptable levels of scale, efficiency and accuracy.
A major hallmark of Parkinson's disease (PD) is the fatal destruction of dopaminergic neurons within the
. This event is preceded by the formation of Lewy bodies, which are cytoplasmic inclusions ...composed of α-synuclein protein aggregates. A triad contribution of α-synuclein aggregation, iron accumulation, and mitochondrial dysfunction plague nigral neurons, yet the events underlying iron accumulation are poorly understood. Elevated intracellular iron concentrations up-regulate ferritin expression, an iron storage protein that provides cytoprotection against redox stress. The lysosomal degradation pathway, autophagy, can release iron from ferritin stores to facilitate its trafficking in a process termed ferritinophagy. Aggregated α-synuclein inhibits SNARE protein complexes and destabilizes microtubules to halt vesicular trafficking systems, including that of autophagy effectively. The scope of this review is to describe the physiological and pathological relationship between iron regulation and α-synuclein, providing a detailed understanding of iron metabolism within nigral neurons. The underlying mechanisms of autophagy and ferritinophagy are explored in the context of PD, identifying potential therapeutic targets for future investigation.
Individualising mental healthcare at times when a patient is most at risk of suicide involves shifting research emphasis from static risk factors to those that may be modifiable with interventions. ...Currently, risk assessment is based on a range of extensively reported stable risk factors, but critical to dynamic suicide risk assessment is an understanding of each individual patient's health trajectory over time. The use of electronic health records (EHRs) and analysis using machine learning has the potential to accelerate progress in developing early warning indicators.
EHR data from the South London and Maudsley NHS Foundation Trust (SLaM) which provides secondary mental healthcare for 1.8 million people living in four South London boroughs.
To determine whether the time window proximal to a hospitalised suicide attempt can be discriminated from a distal period of lower risk by analysing the documentation and mental health clinical free text data from EHRs and (i) investigate whether the rate at which EHR documents are recorded per patient is associated with a suicide attempt; (ii) compare document-level word usage between documents proximal and distal to a suicide attempt; and (iii) compare n-gram frequency related to third-person pronoun use proximal and distal to a suicide attempt using machine learning.
The Clinical Record Interactive Search (CRIS) system allowed access to de-identified information from the EHRs. CRIS has been linked with Hospital Episode Statistics (HES) data for Admitted Patient Care. We analysed document and event data for patients who had at some point between 1 April 2006 and 31 March 2013 been hospitalised with a HES ICD-10 code related to attempted suicide (X60-X84; Y10-Y34; Y87.0/Y87.2).
= 8,247 patients were identified to have made a hospitalised suicide attempt. Of these,
= 3,167 (39.8%) of patients had at least one document available in their EHR prior to their first suicide attempt.
= 1,424 (45.0%) of these patients had been "monitored" by mental healthcare services in the past 30 days. From 60 days prior to a first suicide attempt, there was a rapid increase in the monitoring level (document recording of the past 30 days) increasing from 35.1 to 45.0%. Documents containing words related to prescribed medications/drugs/overdose/poisoning/addiction had the highest odds of being a risk indicator used proximal to a suicide attempt (OR 1.88; precision 0.91 and recall 0.93), and documents with words citing a care plan were associated with the lowest risk for a suicide attempt (OR 0.22; precision 1.00 and recall 1.00). Function words, word sequence, and pronouns were most common in all three representations (uni-, bi-, and tri-gram).
EHR documentation frequency and language use can be used to distinguish periods distal from and proximal to a suicide attempt. However, in our study 55.0% of patients with documentation, prior to their first suicide attempt, did not have a record in the preceding 30 days, meaning that there are a high number who are not seen by services at their most vulnerable point.
ObjectivesTo identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical ...outcomes.DesignObservational study using an anonymised electronic health record case register.SettingSouth London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK.Participants7678 patients with schizophrenia receiving care during 2011.Main outcome measuresHospital admission, readmission and duration of admission.Results10 different negative symptoms were ascertained with precision statistics above 0.80. 41% of patients had 2 or more negative symptoms. Negative symptoms were associated with younger age, male gender and single marital status, and with increased likelihood of hospital admission (OR 1.24, 95% CI 1.10 to 1.39), longer duration of admission (β-coefficient 20.5 days, 7.6–33.5), and increased likelihood of readmission following discharge (OR 1.58, 1.28 to 1.95).ConclusionsNegative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.
Research suggests that an increased risk of physical comorbidities might have a key role in the association between severe mental illness (SMI) and disability. We examined the association between ...physical multimorbidity and disability in individuals with SMI.
Data were extracted from the clinical record interactive search system at South London and Maudsley Biomedical Research Centre. Our sample (n = 13,933) consisted of individuals who had received a primary or secondary SMI diagnosis between 2007 and 2018 and had available data for Health of Nations Outcome Scale (HoNOS) as disability measure. Physical comorbidities were defined using Chapters II-XIV of the International Classification of Diagnoses (ICD-10).
More than 60 % of the sample had complex multimorbidity. The most common organ system affected were neurological (34.7%), dermatological (15.4%), and circulatory (14.8%). All specific comorbidities (ICD-10 Chapters) were associated with higher levels of disability, HoNOS total scores. Individuals with musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders were found to be associated with significant difficulties associated with more than five HoNOS domains while others had a lower number of domains affected.
Individuals with SMI and musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders are at higher risk of disability compared to those who do not have those comorbidities. Individuals with SMI and physical comorbidities are at greater risk of reporting difficulties associated with activities of daily living, hallucinations, and cognitive functioning. Therefore, these should be targeted for prevention and intervention programs.