Abstract
Objective
Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can ...potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs.
Materials and Methods
A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review.
Results
Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9).
Conclusion
NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
We identified patients with non-tuberculous mycobacterial (NTM) disease in the US Veterans Health Administration (VHA), examined the distribution of diseases by NTM species, and explored the ...association between NTM disease and the frequency of clinic visits and mortality.
We combined mycobacterial isolate (from natural language processing) with ICD-9-CM diagnoses from VHA data between 2008 and 2012 and then applied modified ATS/IDSA guidelines for NTM diagnosis. We performed validation against a reference standard of chart review. Incidence rates were calculated. Two nested case-control studies (matched by age and location) were used to measure the association between NTM disease and each of 1) the frequency of outpatient clinic visits and 2) mortality, both adjusted by chronic obstructive pulmonary disease (COPD), other structural lung diseases, and immunomodulatory factors.
NTM cases were identified with a sensitivity of 94%, a specificity of >99%. The incidence of NTM was 12.6/100k patient-years. COPD was present in 68% of pulmonary NTM. NTM incidence was highest in the southeastern US. Extra-pulmonary NTM rates increased during the study period. The incidence rate ratio of clinic visits in the first year after diagnosis was 1.3 95%CI 1.34-1.35. NTM patients had a hazard ratio of mortality of 1.4 95%CI 1.1-1.9 in the 6 months after NTM identification compared to controls and 1.99 95%CI 1.8-2.3 thereafter.
In VHA, pulmonary NTM disease is commonly associated with COPD, with the highest rates in the southeastern US. After adjustment, NTM patients had more clinic visits and greater mortality compared to matched patients.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•265,566 post-9/11 veterans with PTSD had >1 coded psychotherapy visit at the VA in 15 years.•While 22.8% initiated an evidence-based psychotherapy (EBP), only 9.1% completed treatment.•Veterans who ...completed EBP did so about 3 years after their initial mental health visit.•Factors associated with EBP completion included MST and combat/deployments.
Little is known about predictors of initiation and completion of evidence-based psychotherapy (EBP) for posttraumatic stress disorder (PTSD), with most data coming from small cohort studies and post-hoc analyses of clinical trials. We examined patient and treatment factors associated with initiation and completion of EBP for PTSD in a large longitudinal cohort. We conducted a national, retrospective cohort study of all Iraq and Afghanistan War veterans who had a post-deployment PTSD diagnosis from 10/01–9/15 at a Veterans Health Administration facility and had at least one coded post-deployment psychotherapy visit. We examined utilization of PE and CPT (individual or group) during any 24-week period. We used ordered logistic, logistic, and Cox proportional hazards regressions to examine variables associated with EBP initiation, early termination, and completion, and time to completion. Over a 15-year period, of 265,566 veterans with PTSD, 22.8% initiated an EBP, and only 9.1% completed treatment. Completers did so about three years after their initial mental health visit. Factors positively associated with EBP completion included military sexual trauma, older age, race/ethnicity (i.e., African-American race for PE), combat, and multiple deployments. The VHA has become timelier in delivering EBP for PTSD, and several subgroups are more likely to complete EBP.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Postpolypectomy risk stratification for subsequent metachronous advanced neoplasia (MAN) is imprecise and does not account for colonoscopist adenoma detection rate (ADR). Our aim was to assess ...association of ADR with MAN and create a prediction model for postpolypectomy risk stratification incorporating ADR and other factors.
We conducted a retrospective cohort study of individuals with baseline polypectomy and subsequent surveillance colonoscopy from 2004 to 2016 within the U.S. Department of Veterans Affairs (VA). Clinical factors, polyp findings, and baseline colonoscopist ADR were considered for the model. Model performance (sensitivity, specificity, and area under the curve) for identifying individuals with MAN was compared with 2020 U.S. Multi-Society Task Force on Colorectal Cancer (USMSTF) surveillance recommendations.
A total of 30,897 individuals were randomly assigned 2:1 into independent model training and validation sets. Increasing age, male sex, diabetes, current smoking, adenoma number, polyp location, adenoma ≥10 mm or with tubulovillous/villous features, and decreasing colonoscopist ADR were independently associated with MAN. A range of 1.48- to 1.66-fold increased risk for MAN was observed for ADR in the lowest 3 quintiles (ADR <19.7%–39.3%) vs the highest quintile (ADR >47.0%). When the final model selected based on the training set was applied to the validation set, improved sensitivity and specificity over 2020 USMSTF risk stratification were achieved (P = .001), with an area under the curve of 0.62 (95% confidence interval, 0.60–0.64).
Colonoscopist ADR is associated with MAN. Combining clinical factors and ADR for risk stratification has potential to improve postpolypectomy risk stratification. Improving ADR is likely to improve postpolypectomy outcomes.
Display omitted
While evidence-based psychotherapy (EBP) for posttraumatic stress disorder (PTSD) is a first-line treatment, its real-world effectiveness is unknown. We compared cognitive processing therapy (CPT) ...and prolonged exposure (PE) each to an individual psychotherapy comparator group, and CPT to PE in a large national healthcare system.
We utilized effectiveness and comparative effectiveness emulated trials using retrospective cohort data from electronic medical records. Participants were veterans with PTSD initiating mental healthcare (
= 265 566). The primary outcome was PTSD symptoms measured by the PTSD Checklist (PCL) at baseline and 24-week follow-up. Emulated trials were comprised of 'person-trials,' representing 112 discrete 24-week periods of care (10/07-6/17) for each patient. Treatment group comparisons were made with generalized linear models, utilizing propensity score matching and inverse probability weights to account for confounding, selection, and non-adherence bias.
There were 636 CPT person-trials matched to 636 non-EBP person-trials. Completing ⩾8 CPT sessions was associated with a 6.4-point greater improvement on the PCL (95% CI 3.1-10.0). There were 272 PE person-trials matched to 272 non-EBP person-trials. Completing ⩾8 PE sessions was associated with a 9.7-point greater improvement on the PCL (95% CI 5.4-13.8). There were 232 PE person-trials matched to 232 CPT person-trials. Those completing ⩾8 PE sessions had slightly greater, but not statistically significant, improvement on the PCL (8.3-points; 95% CI 5.9-10.6) than those completing ⩾8 CPT sessions (7.0-points; 95% CI 5.5-8.5).
PTSD symptom improvement was similar and modest for both EBPs. Although EBPs are helpful, research to further improve PTSD care is critical.
Peripheral artery disease (PAD) is a leading cause of cardiovascular morbidity and mortality; however, the extent to which genetic factors increase risk for PAD is largely unknown. Using electronic ...health record data, we performed a genome-wide association study in the Million Veteran Program testing ~32 million DNA sequence variants with PAD (31,307 cases and 211,753 controls) across veterans of European, African and Hispanic ancestry. The results were replicated in an independent sample of 5,117 PAD cases and 389,291 controls from the UK Biobank. We identified 19 PAD loci, 18 of which have not been previously reported. Eleven of the 19 loci were associated with disease in three vascular beds (coronary, cerebral, peripheral), including LDLR, LPL and LPA, suggesting that therapeutic modulation of low-density lipoprotein cholesterol, the lipoprotein lipase pathway or circulating lipoprotein(a) may be efficacious for multiple atherosclerotic disease phenotypes. Conversely, four of the variants appeared to be specific for PAD, including F5 p.R506Q, highlighting the pathogenic role of thrombosis in the peripheral vascular bed and providing genetic support for Factor Xa inhibition as a therapeutic strategy for PAD. Our results highlight mechanistic similarities and differences among coronary, cerebral and peripheral atherosclerosis and provide therapeutic insights.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Traditional serrated adenomas (TSAs) may confer increased risk for colorectal cancer (CRC). Our objective with this study was to examine clinical characteristics and long-term outcomes associated ...with TSA diagnosis.
We conducted a retrospective cohort study of U.S. Veterans ≥18 years of age with ≥1 TSA between 1999 and 2018. Baseline characteristics, colonoscopy findings, and diagnosis of incident and fatal CRC were abstracted. Advanced neoplasia was defined by CRC or adenoma with high-grade dysplasia, villous histology, or size ≥1 cm. Follow-up was through CRC diagnosis, death, or end of study (December 31, 2018).
A total of 853 Veterans with a baseline TSA were identified; 74% were ≥60 years of age, 96% were men, 14% were Black, and 73% were non-Hispanic White. About 64% were current or former smokers. Over 2044 total person-years at follow-up, there were 11 incident CRC cases and 1 CRC death. Cumulative CRC incidence was 1.34% (95% confidence interval CI, 0.67%–2.68%), and cumulative CRC death was 0.12% (95% CI, 0.00%–0.35%). Among the subset of 378 TSA patients with ≥1 surveillance colonoscopy, 65.1% had high-risk neoplasia on follow-up. CRC incidence among TSA patients was significantly higher than in a comparison cohort of patients with normal baseline colonoscopy at baseline (hazard ratio, 3.70; 95% CI, 1.63–8.41) and similar to a comparison cohort with baseline conventional advanced adenoma (hazard ratio, 0.86; 95% CI, 0.45–1.64).
Individuals with TSA have substantial risk for CRC based on their cumulative CRC incidence, as well as significant risk of developing other high-risk neoplasia at follow-up surveillance colonoscopy. These data underscore importance of current recommendations for close colonoscopy surveillance after TSA diagnosis.
▪
Background
Although evidence‐based psychotherapies (EBPs) for posttraumatic stress disorder (PTSD) were implemented starting in 2005 in the veterans health administration (VHA), the largest national ...healthcare system in the U.S., the rate of initiation (uptake) and prevalence of these treatments in each calendar year have not been determined. We aimed to elucidate two metrics of EBP utilization, uptake and prevalence, following implementation.
Methods
Cohort study of Iraq and Afghanistan veterans in VHA (N = 181,620) with a PTSD diagnosis and ≥1 psychotherapy‐coded outpatient visit from 2001 to 2014. Using natural language processing techniques, annual and cumulative uptake and prevalence rates from 2001 to 2014 were calculated for each of the two EBPs for PTSD, cognitive processing therapy (CPT) and prolonged exposure (PE) therapy.
Results
Annual uptake of CPT increased during most years, reaching a maximum of 11.1%. Annual uptake of PE showed little change until 2008 and then increased, reaching a maximum of 4.4%. The annual prevalence of CPT increased throughout the study, reaching a maximum of 14.6%. The annual prevalence of PE increased to a maximum of 5.0% in 2010, but then flattened and declined. Annual uptake of minimally adequate CPT increased a to maximum of 5% in 2014. Annual uptake of minimally adequate PE increased to a maximum of 1.2% in 2010. The cumulative prevalence of CPT was 19.9% and cumulative prevalence for PE was 7.5%.
Conclusions
Access to EBPs for PTSD modestly increased for Iraq and Afghanistan veterans after nationwide implementation efforts. Further expanding the reach to veterans is critical, given low rates of minimally adequate EBPs for PTSD.
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UILJ, UKNU, UL, UM, UPUK
Introduction
Identifying occurrences of medication side effects and adverse drug events (ADEs) is an important and challenging task because they are frequently only mentioned in clinical narrative ...and are not formally reported.
Methods
We developed a natural language processing (NLP) system that aims to identify mentions of symptoms and drugs in clinical notes and label the relationship between the mentions as indications or ADEs. The system leverages an existing word embeddings model with induced word clusters for dimensionality reduction. It employs a conditional random field (CRF) model for named entity recognition (NER) and a random forest model for relation extraction (RE).
Results
Final performance of each model was evaluated separately and then combined on a manually annotated evaluation set. The micro-averaged F1 score was 80.9% for NER, 88.1% for RE, and 61.2% for the integrated systems. Outputs from our systems were submitted to the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) competition (Yu et al. in
http://bio-nlp.org/index.php/projects/39-nlp-challenges
,
2018
). System performance was evaluated in three tasks (NER, RE, and complete system) with multiple teams submitting output from their systems for each task. Our RE system placed first in Task 2 of the challenge and our integrated system achieved third place in Task 3.
Conclusion
Adding to the growing number of publications that utilize NLP to detect occurrences of ADEs, our study illustrates the benefits of employing innovative feature engineering.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Despite improvements in electronic medical record capability to collect data on sexual orientation, not all healthcare systems have adopted this practice. This can limit the usability of systemwide ...electronic medical record data for sexual minority research. One viable resource might be the documentation of sexual orientation within clinical notes. The authors developed an approach to identify sexual orientation documentation and subsequently derived a cohort of sexual minority patients using clinical notes from the Veterans Health Administration electronic medical record.
A hybrid natural language processing approach was developed and used to identify and categorize instances of terms and phrases related to sexual orientation in Veterans Health Administration clinical notes from 2000 to 2019. System performance was assessed with positive predictive value and sensitivity. Data were analyzed in 2019.
A total of 2,413,584 sexual minority terms/phrases were found within clinical notes, of which 439,039 (18%) were found to be related to patient sexual orientation with a positive predictive value of 85.9%. Documentation of sexual orientation was found for 115,312 patients. When compared with 2,262 patients with a record of administrative coding for homosexuality, the system found mentions of sexual orientation for 1,808 patients (79.9% sensitivity).
When systemwide structured data are unavailable or inconsistent, deriving a cohort of sexual minority patients in electronic medical records for research is possible and permits longitudinal analysis across multiple clinical domains. Although limitations and challenges to the approach were identified, this study makes an important step forward for the Veterans Health Administration sexual minority research, and the methodology can be applied in other healthcare organizations.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP