The safety of medication use has been a priority in the United States since the late 1930s. Recently, it has gained prominence due to the increasing amount of data suggesting that a large amount of ...patient harm is preventable and can be mitigated with effective risk strategies that have not been sufficiently adopted. Adverse events from medications are part of clinical practice, but the ability to identify a patient's risk and to minimize that risk must be a priority. The ability to identify adverse events has been a challenge due to limitations of available data sources, which are often free text. The use of natural language processing (NLP) may help to address these limitations. NLP is the artificial intelligence domain of computer science that uses computers to manipulate unstructured data (i.e., narrative text or speech data) in the context of a specific task. In this narrative review, we illustrate the fundamentals of NLP and discuss NLP's application to medication safety in four data sources: electronic health records, Internet‐based data, published literature, and reporting systems. Given the magnitude of available data from these sources, a growing area is the use of computer algorithms to help automatically detect associations between medications and adverse effects. The main benefit of NLP is in the time savings associated with automation of various medication safety tasks such as the medication reconciliation process facilitated by computers, as well as the potential for near–real‐time identification of adverse events for postmarketing surveillance such as those posted on social media that would otherwise go unanalyzed. NLP is limited by a lack of data sharing between health care organizations due to insufficient interoperability capabilities, inhibiting large‐scale adverse event monitoring across populations. We anticipate that future work in this area will focus on the integration of data sources from different domains to improve the ability to identify potential adverse events more quickly and to improve clinical decision support with regard to a patient's estimated risk for specific adverse events at the time of medication prescription or review.
Background Food allergy prevalence is reported to be increasing, but epidemiological data using patients’ electronic health records (EHRs) remain sparse. Objective We sought to determine the ...prevalence of food allergy and intolerance documented in the EHR allergy module. Methods Using allergy data from a large health care organization's EHR between 2000 and 2013, we determined the prevalence of food allergy and intolerance by sex, racial/ethnic group, and allergen group. We examined the prevalence of reactions that were potentially IgE-mediated and anaphylactic. Data were validated using radioallergosorbent test and ImmunoCAP results, when available, for patients with reported peanut allergy. Results Among 2.7 million patients, we identified 97,482 patients (3.6%) with 1 or more food allergies or intolerances (mean, 1.4 ± 0.1). The prevalence of food allergy and intolerance was higher in females (4.2% vs 2.9%; P < .001) and Asians (4.3% vs 3.6%; P < .001). The most common food allergen groups were shellfish (0.9%), fruit or vegetable (0.7%), dairy (0.5%), and peanut (0.5%). Of the 103,659 identified reactions to foods, 48.1% were potentially IgE-mediated (affecting 50.8% of food allergy or intolerance patients) and 15.9% were anaphylactic. About 20% of patients with reported peanut allergy had a radioallergosorbent test/ImmunoCAP performed, of which 57.3% had an IgE level of grade 3 or higher. Conclusions Our findings are consistent with previously validated methods for studying food allergy, suggesting that the EHR's allergy module has the potential to be used for clinical and epidemiological research. The spectrum of severity observed with food allergy highlights the critical need for more allergy evaluations.
Natural language processing (NLP) tools turn free‐text notes (FTNs) from electronic health records (EHRs) into data features that can supplement confounding adjustment in pharmacoepidemiologic ...studies. However, current applications are difficult to scale. We used unsupervised NLP to generate high‐dimensional feature spaces from FTNs to improve prediction of drug exposure and outcomes compared with claims‐based analyses. We linked Medicare claims with EHR data to generate three cohort studies comparing different classes of medications on the risk of various clinical outcomes. We used “bag‐of‐words” to generate features for the top 20,000 most prevalent terms from FTNs. We compared machine learning (ML) prediction algorithms using different sets of candidate predictors: Set1 (39 researcher‐specified variables), Set2 (Set1 + ML‐selected claims codes), and Set3 (Set1 + ML‐selected NLP‐generated features), vs. Set4 (Set1 + 2 + 3). When modeling treatment choice, we observed a consistent pattern across the examples: ML models utilizing Set4 performed best followed by Set2, Set3, then Set1. When modeling the outcome risk, there was little to no improvement beyond models based on Set1. Supplementing claims data with NLP‐generated features from free text notes improved prediction of prescribing choices but had little or no improvement on clinical risk prediction. These findings have implications for strategies to improve confounding using EHR data in pharmacoepidemiologic studies.
Abstract
Objective
Understanding public discourse on emergency use of unproven therapeutics is essential to monitor safe use and combat misinformation. We developed a natural language ...processing-based pipeline to understand public perceptions of and stances on coronavirus disease 2019 (COVID-19)-related drugs on Twitter across time.
Methods
This retrospective study included 609 189 US-based tweets between January 29, 2020 and November 30, 2021 on 4 drugs that gained wide public attention during the COVID-19 pandemic: (1) Hydroxychloroquine and Ivermectin, drug therapies with anecdotal evidence; and (2) Molnupiravir and Remdesivir, FDA-approved treatment options for eligible patients. Time-trend analysis was used to understand the popularity and related events. Content and demographic analyses were conducted to explore potential rationales of people’s stances on each drug.
Results
Time-trend analysis revealed that Hydroxychloroquine and Ivermectin received much more discussion than Molnupiravir and Remdesivir, particularly during COVID-19 surges. Hydroxychloroquine and Ivermectin were highly politicized, related to conspiracy theories, hearsay, celebrity effects, etc. The distribution of stance between the 2 major US political parties was significantly different (P < .001); Republicans were much more likely to support Hydroxychloroquine (+55%) and Ivermectin (+30%) than Democrats. People with healthcare backgrounds tended to oppose Hydroxychloroquine (+7%) more than the general population; in contrast, the general population was more likely to support Ivermectin (+14%).
Conclusion
Our study found that social media users with have different perceptions and stances on off-label versus FDA-authorized drug use across different stages of COVID-19, indicating that health systems, regulatory agencies, and policymakers should design tailored strategies to monitor and reduce misinformation for promoting safe drug use. Our analysis pipeline and stance detection models are made public at https://github.com/ningkko/COVID-drug.
Seneviratne et al focus on case management to demonstrate how one might implement their proposed user-centred design toolkit consisting of process maps, storyboards and four questions.1 This toolkit ...was developed to address the tendency to develop machine learning models in an opportunistic manner based on the availability of data rather than through fundamental design principles focused on resolving the actual pain points of stakeholders. User-centred design for machine learning in health care: a case study from care management. Interventions based on early intensive applied behaviour analysis for autistic children: a systematic review and cost-effectiveness analysis.
Abstract
Objectives
To develop an unbiased objective for learning automatic coding algorithms from clinical records annotated with only partial relevant International Classification of Diseases ...codes, as annotation noise in undercoded clinical records used as training data can mislead the learning process of deep neural networks.
Materials and Methods
We use Medical Information Mart for Intensive Care III as our dataset. We employ positive-unlabeled learning to achieve unbiased loss estimation, which is free of misleading training signal. We then utilize reweighting mechanism to compensate for the imbalance between positive and negative samples. To further close the performance gap caused by poor quality annotation, we integrate the supervision provided by the automatic annotation tool Medical Concept Annotation Toolkit which can ease the heavy burden of manual validation.
Results
Our benchmarking results show that positive-unlabeled learning with reweighting outperforms competitive baseline methods over a range of missing label ratios. Integrating supervision provided by annotation tool further boosted the performance.
Discussion
Considering the annotation noise and severe imbalance, unbiased loss estimation and reweighting mechanism are both important for learning from undercoded clinical records. Unbiased loss requires the estimation of false negative ratios and estimation through trained models is practical and competitive.
Conclusions
The combination of positive-unlabeled learning with reweighting and supervision provided by the annotation tool is a promising solution to learn from undercoded clinical records.
A health data economy has begun to form, but its rise has been tempered by the profound lack of sharing of both data and data products such as models, intermediate results, and annotated training ...corpora, and this severely limits the potential for triggering economic cluster effects. Economic cluster effects represent a means to elicit benefit from economies of scale from internal data innovations and are beneficial because they may mitigate challenges from external sources. Within institutions, data product sharing is needed to spark data entrepreneurship and data innovation, and cross-institutional sharing is also critical, especially for rare conditions.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Abstract
Data change the game in terms of how we respond to pandemics. Global data on disease trajectories and the effectiveness and economic impact of different social distancing measures are ...essential to facilitate effective local responses to pandemics. COVID-19 data flowing across geographic borders are extremely useful to public health professionals for many purposes such as accelerating the pharmaceutical development pipeline, and for making vital decisions about intensive care unit rooms, where to build temporary hospitals, or where to boost supplies of personal protection equipment, ventilators, or diagnostic tests. Sharing data enables quicker dissemination and validation of pharmaceutical innovations, as well as improved knowledge of what prevention and mitigation measures work. Even if physical borders around the globe are closed, it is crucial that data continues to transparently flow across borders to enable a data economy to thrive, which will promote global public health through global cooperation and solidarity.
Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, ...improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF).
Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices.
We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.
Abstract
Background
People live a long time in pre-diabetes/early diabetes without a formal diagnosis or management. Heterogeneity of progression coupled with deficiencies in electronic health ...records related to incomplete data, discrete events, and irregular event intervals make identification of pre-diabetes and critical points of diabetes progression challenging.
Methods
We utilized longitudinal electronic health records of 9298 patients with type 2 diabetes or prediabetes from 2005 to 2016 from a large regional healthcare delivery network in China. We optimized a generative Markov-Bayesian-based model to generate 5000 synthetic illness trajectories. The synthetic data were manually reviewed by endocrinologists.
Results
We build an optimized generative progression model for type 2 diabetes using anchor information to reduce the number of parameters learning in the third layer of the model from
$$O\left(N\times W\right)$$
O
N
×
W
to
$$O\left((N-C)\times W\right)$$
O
(
N
-
C
)
×
W
, where
$$N$$
N
is the number of clinical findings,
$$W$$
W
is the number of complications,
$$C$$
C
is the number of anchors. Based on this model, we infer the relationships between progression stages, the onset of complication categories, and the associated diagnoses during the whole progression of type 2 diabetes using electronic health records.
Discussion
Our findings indicate that 55.3% of single complications and 31.8% of complication patterns could be predicted early and managed appropriately to potentially delay (as it is a progressive disease) or prevented (by lifestyle modifications that keep patient from developing/triggering diabetes in the first place).
Conclusions
The full type 2 diabetes patient trajectories generated by the chronic disease progression model can counter a lack of real-world evidence of desired longitudinal timeframe while facilitating population health management.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK