ObjectiveThis paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from ...a large mental healthcare provider.DesignA multidisciplinary team iteratively developed guidelines for annotating clinical text referring to violence. Keywords were used to generate a dataset which was annotated (ie, classified as affirmed, negated or irrelevant) for: presence of violence, patient status (ie, as perpetrator, witness and/or victim of violence) and violence type (domestic, physical and/or sexual). An NLP approach using a pretrained transformer model, BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) was fine-tuned on the annotated dataset and evaluated using 10-fold cross-validation.SettingWe used the Clinical Records Interactive Search (CRIS) database, comprising over 500 000 de-identified EHRs of patients within the South London and Maudsley NHS Foundation Trust, a specialist mental healthcare provider serving an urban catchment area.ParticipantsSearches of CRIS were carried out based on 17 predefined keywords. Randomly selected text fragments were taken from the results for each keyword, amounting to 3771 text fragments from the records of 2832 patients.Outcome measuresWe estimated precision, recall and F1 score for each NLP model. We examined sociodemographic and clinical variables in patients giving rise to the text data, and frequencies for each annotated violence characteristic.ResultsBinary classification models were developed for six labels (violence presence, perpetrator, victim, domestic, physical and sexual). Among annotations affirmed for the presence of any violence, 78% (1724) referred to physical violence, 61% (1350) referred to patients as perpetrator and 33% (731) to domestic violence. NLP models’ precision ranged from 89% (perpetrator) to 98% (sexual); recall ranged from 89% (victim, perpetrator) to 97% (sexual).ConclusionsState of the art NLP models can extract and classify clinical text on violence from EHRs at acceptable levels of scale, efficiency and accuracy.
Research suggests that an increased risk of physical comorbidities might have a key role in the association between severe mental illness (SMI) and disability. We examined the association between ...physical multimorbidity and disability in individuals with SMI.
Data were extracted from the clinical record interactive search system at South London and Maudsley Biomedical Research Centre. Our sample (n = 13,933) consisted of individuals who had received a primary or secondary SMI diagnosis between 2007 and 2018 and had available data for Health of Nations Outcome Scale (HoNOS) as disability measure. Physical comorbidities were defined using Chapters II-XIV of the International Classification of Diagnoses (ICD-10).
More than 60 % of the sample had complex multimorbidity. The most common organ system affected were neurological (34.7%), dermatological (15.4%), and circulatory (14.8%). All specific comorbidities (ICD-10 Chapters) were associated with higher levels of disability, HoNOS total scores. Individuals with musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders were found to be associated with significant difficulties associated with more than five HoNOS domains while others had a lower number of domains affected.
Individuals with SMI and musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders are at higher risk of disability compared to those who do not have those comorbidities. Individuals with SMI and physical comorbidities are at greater risk of reporting difficulties associated with activities of daily living, hallucinations, and cognitive functioning. Therefore, these should be targeted for prevention and intervention programs.
ObjectivesThe first aim of this study was to design and develop a valid and replicable strategy to extract physical health conditions from clinical notes which are common in mental health services. ...Then, we examined the prevalence of these conditions in individuals with severe mental illness (SMI) and compared their individual and combined prevalence in individuals with bipolar (BD) and schizophrenia spectrum disorders (SSD).DesignObservational study.SettingSecondary mental healthcare services from South LondonParticipantsOur maximal sample comprised 17 500 individuals aged 15 years or older who had received a primary or secondary SMI diagnosis (International Classification of Diseases, 10th edition, F20-31) between 2007 and 2018.MeasuresWe designed and implemented a data extraction strategy for 21 common physical comorbidities using a natural language processing pipeline, MedCAT. Associations were investigated with sex, age at SMI diagnosis, ethnicity and social deprivation for the whole cohort and the BD and SSD subgroups. Linear regression models were used to examine associations with disability measured by the Health of Nations Outcome Scale.ResultsPhysical health data were extracted, achieving precision rates (F1) above 0.90 for all conditions. The 10 most prevalent conditions were diabetes, hypertension, asthma, arthritis, epilepsy, cerebrovascular accident, eczema, migraine, ischaemic heart disease and chronic obstructive pulmonary disease. The most prevalent combination in this population included diabetes, hypertension and asthma, regardless of their SMI diagnoses.ConclusionsOur data extraction strategy was found to be adequate to extract physical health data from clinical notes, which is essential for future multimorbidity research using text records. We found that around 40% of our cohort had multimorbidity from which 20% had complex multimorbidity (two or more physical conditions besides SMI). Sex, age, ethnicity and social deprivation were found to be key to understand their heterogeneity and their differential contribution to disability levels in this population. These outputs have direct implications for researchers and clinicians.
Development of a Lexicon for Pain Chaturvedi, Jaya; Mascio, Aurelie; Velupillai, Sumithra U ...
Frontiers in digital health,
12/2021, Letnik:
3
Journal Article
Recenzirano
Odprti dostop
Pain has been an area of growing interest in the past decade and is known to be associated with mental health issues. Due to the ambiguous nature of how pain is described in text, it presents a ...unique natural language processing (NLP) challenge. Understanding how pain is described in text and utilizing this knowledge to improve NLP tasks would be of substantial clinical importance. Not much work has previously been done in this space. For this reason, and in order to develop an English lexicon for use in NLP applications, an exploration of pain concepts within free text was conducted. The exploratory text sources included two hospital databases, a social media platform (Twitter), and an online community (Reddit). This exploration helped select appropriate sources and inform the construction of a pain lexicon. The terms within the final lexicon were derived from three sources-literature, ontologies, and word embedding models. This lexicon was validated by two clinicians as well as compared to an existing 26-term pain sub-ontology and MeSH (Medical Subject Headings) terms. The final validated lexicon consists of 382 terms and will be used in downstream NLP tasks by helping select appropriate pain-related documents from electronic health record (EHR) databases, as well as pre-annotating these words to help in development of an NLP application for classification of mentions of pain within the documents. The lexicon and the code used to generate the embedding models have been made publicly available.
Cognitive impairments are a neglected aspect of schizophrenia despite being a major factor of poor functional outcome. They are usually measured using various rating scales, however, these ...necessitate trained practitioners and are rarely routinely applied in clinical settings. Recent advances in natural language processing techniques allow us to extract such information from unstructured portions of text at a large scale and in a cost effective manner. We aimed to identify cognitive problems in the clinical records of a large sample of patients with schizophrenia, and assess their association with clinical outcomes.
We developed a natural language processing based application identifying cognitive dysfunctions from the free text of medical records, and assessed its performance against a rating scale widely used in the United Kingdom, the cognitive component of the Health of the Nation Outcome Scales (HoNOS). Furthermore, we analyzed cognitive trajectories over the course of patient treatment, and evaluated their relationship with various socio-demographic factors and clinical outcomes.
We found a high prevalence of cognitive impairments in patients with schizophrenia, and a strong correlation with several socio-demographic factors (gender, education, ethnicity, marital status, and employment) as well as adverse clinical outcomes. Results obtained from the free text were broadly in line with those obtained using the HoNOS subscale, and shed light on additional associations, notably related to attention and social impairments for patients with higher education.
Our findings demonstrate that cognitive problems are common in patients with schizophrenia, can be reliably extracted from clinical records using natural language processing, and are associated with adverse clinical outcomes. Harvesting the free text from medical records provides a larger coverage in contrast to neurocognitive batteries or rating scales, and access to additional socio-demographic and clinical variables. Text mining tools can therefore facilitate large scale patient screening and early symptoms detection, and ultimately help inform clinical decisions.
Summary
Objectives:
We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better ...understand recent trends in this area and identify opportunities for future research.
Methods:
We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results.
Results:
The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation.
Conclusions:
While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.
•High performance clinical information extraction supports pertinent clinical research.•Multi-site hospital natural language processing models scale across settings.•Flexible informatics empowers ...fast clinician lead research and analysis.•Fast, scalable, flexible electronic health record information extraction.
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448–0.738 vs 0.429–0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
This thesis examines the longitudinal changes in cognition for patients with schizophrenia using free text from medical records. To this end, we introduce a unified framework to extract, annotate, ...classify and analyse cognitive impairments from unstructured text, evaluate symptom trajectories and measure their association with socio-demographic factors and clinical outcomes. The framework was further extended to any type of symptom that can be defined using a list of keywords, and allows easy implementation and deployment within the Clinical Record Interactive Search (CRIS) system, which provides researchers with regulated and secure access to anonymised information from clinical records. A standardized approach to extract and annotate portions of unstructured text relevant to cognitive impairments was developed in conjunction with clinicians and researchers. This annotated dataset was then used to train text classification algorithms, in order to separate affirmed versus irrelevant or negated mentions of cognitive symptoms. An extensive comparative study, looking at existing text classification methods within the biomedical as well as general domains was conducted on both public and internal datasets. The results showed that transformer-based approaches, which are the current state of the art for many natural language processing tasks, outperform other methods in terms of accuracy, ease of implementation and scalability, particularly when trained on a combination of general and medical data. This text classification model was subsequently used to derive cognitive score time series from the free text of medical records. This "digital signature" of cognitive changes was in turn validated against scores obtained from clinically administered tests, confirming the accuracy and reliability of the model. Symptom trajectories were then evaluated using mixed linear models, again comparing the results obtained with the transformer model against standardized instruments. Both approaches demonstrated similar rates of change, indicating a gradual cognitive decline with age, which is attenuated by certain socio-demographic factors such as education, employment or marital status. The transformer-based model highlighted a strong association between education and cognition, showing that certain cognitive impairments, specifically attention and social cognition, were more likely to be reported early for patients with a higher education level. The relationship between cognition and clinical outcomes was also analysed, indicating that cognitive problems are correlated with adverse outcomes. This supports the findings in the literature that these symptoms account for much of the disability associated with schizophrenia. Finally, the text classification framework was tested and generalized to cover other symptoms and patient groups, allowing the development of a standardized set of tools that were then deployed within health research settings. This formed the basis for other research, notably COVID-related projects, which involved extracting mentions of anxiety and violent behaviour from the free text of clinical records, paving the way to further clinical applications. The contribution of this research is both methodological and practical. The use of a novel symptom extraction, classification and analysis framework demonstrates that cognitive impairments can be reliably harvested from the free text of medical records using deep learning models. The framework shows that these impairments are common in patients with schizophrenia and are correlated with adverse clinical outcomes. It provides a scalable and adaptable means of conducting research using large, unstructured datasets, typical of the vast amount of data routinely collected in clinical records. Such automated tools can be utilized to detect early impairments, screen individuals and identify those who would benefit from more comprehensive assessments, and ultimately support real-time clinical decision making.
•We compared the cognitive trajectories in individuals with dementia with and without comorbid severe mental illness using data from one of the largest mental health care providers in Europe.•Our ...results showed that individuals with comorbid SMI were more likely to have a faster decline in their MMSE scores compared with those that had dementia without this comorbidity. However, this association was attenuated when considering socio-demographics, smoking and cardiovascular risk factors and medication.•Our findings highlight the potential risk for an accelerated cognitive decline in this group of individuals, specially for those with bipolar disorders, and the need to further investigate the role of potential shared mechanisms.
We aimed to compare trajectories of cognitive performance in individuals diagnosed with dementia with and without severe mental illness (SMI).
Retrospective cohort study.
We used data from a large longitudinal mental healthcare case register, the Clinical Record Interactive Search (CRIS), at the South London and Maudsley NHS Foundation Trust (SLaM) which provides mental health services to four south London boroughs.
Our sample (N = 4718) consisted of any individual who had a primary or secondary diagnosis of dementia from 2007 to 2018, was 50 years old or over at first diagnosis of dementia and had at least 3 recorded Mini-Mental State Examination (MMSE) scores.
Cognitive performance was measured using MMSE. Linear mixed models were fitted to explore whether MMSE trajectories differed between individuals with or without prior/current SMI diagnoses. Models were adjusted by socio-demographics, cardiovascular risk, smoking, and medication.
Our results showed differences in the rate of change, where individuals with comorbid SMI had a faster decline when compared with those that have dementia without comorbid SMI. However, this association was partially attenuated when adjusted by socio-demographics, smoking and cardiovascular risk factors; and more substantially attenuated when medication was included in models. Additional analyses showed that this accelerated decline might be more evident in individuals with bipolar disorders. Future research to detangle the potential biological underlying mechanisms of these associations is needed.
Text classification tasks which aim at harvesting and/or organizing information from electronic health records are pivotal to support clinical and translational research. However these present ...specific challenges compared to other classification tasks, notably due to the particular nature of the medical lexicon and language used in clinical records. Recent advances in embedding methods have shown promising results for several clinical tasks, yet there is no exhaustive comparison of such approaches with other commonly used word representations and classification models. In this work, we analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks. The results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones based on contextual embeddings such as BERT.