Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data ...convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.
Artificial intelligence (AI) has the potential to fundamentally alter the way medicine is practised. AI platforms excel in recognizing complex patterns in medical data and provide a quantitative, ...rather than purely qualitative, assessment of clinical conditions. Accordingly, AI could have particularly transformative applications in radiation oncology given the multifaceted and highly technical nature of this field of medicine with a heavy reliance on digital data processing and computer software. Indeed, AI has the potential to improve the accuracy, precision, efficiency and overall quality of radiation therapy for patients with cancer. In this Perspective, we first provide a general description of AI methods, followed by a high-level overview of the radiation therapy workflow with discussion of the implications that AI is likely to have on each step of this process. Finally, we describe the challenges associated with the clinical development and implementation of AI platforms in radiation oncology and provide our perspective on how these platforms might change the roles of radiotherapy medical professionals.
Radiotherapy-associated cardiac toxicity studies in patients with locally advanced non-small cell lung cancer (NSCLC) have been limited by small sample size and nonvalidated cardiac endpoints.
The ...purpose of this analysis was to ascertain whether cardiac radiation dose is a predictor of major adverse cardiac events (MACE) and all-cause mortality (ACM).
This retrospective analysis included 748 consecutive locally advanced NSCLC patients treated with thoracic radiotherapy. Fine and Gray and Cox regressions were used to identify predictors for MACE and ACM, adjusting for lung cancer and cardiovascular prognostic factors, including pre-existing coronary heart disease (CHD).
After a median follow-up of 20.4 months, 77 patients developed ≥1 MACE (2-year cumulative incidence, 5.8%; 95% confidence interval CI: 4.3% to 7.7%), and 533 died. Mean radiation dose delivered to the heart (mean heart dose) was associated with a significantly increased risk of MACE (adjusted hazard ratio HR: 1.05/Gy; 95% CI: 1.02 to 1.08/Gy; p < 0.001) and ACM (adjusted HR: 1.02/Gy; 95% CI: 1.00 to 1.03/Gy; p = 0.007). Mean heart dose (≥10 Gy vs. <10 Gy) was associated with a significantly increased risk of ACM in CHD-negative patients (178 vs. 118 deaths; HR: 1.34; 95% CI: 1.06 to 1.69; p = 0.014) with 2-year estimates of 52.2% (95% CI: 46.1% to 58.5%) versus 40.0% (95% CI: 33.5% to 47.4%); but not among CHD-positive patients (112 vs. 82 deaths; HR: 0.94; 95% CI: 0.70 to 1.25; p = 0.66) with 2-year estimates of 54.6% (95% CI: 46.8% to 62.7%) versus 50.8% (95% CI: 41.5% to 60.9%), respectively (p for interaction = 0.028).
Despite the competing risk of cancer-specific death in locally advanced NSCLC patients, cardiac radiation dose exposure is a modifiable cardiac risk factor for MACE and ACM, supporting the need for early recognition and treatment of cardiovascular events and more stringent avoidance of high cardiac radiotherapy dose.
Mean heart dose (MHD) over 10 Gy and left anterior descending (LAD) coronary artery volume (V) receiving 15 Gy (V15Gy) greater than 10% can significantly increase the risk of major adverse cardiac ...events (MACE) in patients with non-small cell lung cancer (NSCLC). We sought to characterize the discordance between MHD and LAD dose and the association of this classification on the risk of MACE after radiation therapy.
The coefficient of determination for MHD and LAD V15Gy was calculated in this retrospective analysis of 701 patients with locally advanced NSCLC treated with radiation therapy. Four groups were defined on the basis of high or low MHD (≥10 Gy vs <10 Gy) and LAD V15Gy (≥10% vs <10%). MACE (unstable angina, heart failure, myocardial infarction, coronary revascularization, and cardiac death) cumulative incidence was estimated, and Fine and Gray regressions were performed.
The proportion of variance in LAD V15Gy predictable from MHD was only 54.5% (R2 = 0.545). There was discordance (where MHD was high ≥10 Gy and LAD low V15Gy < 10%, or vice versa) in 23.1% of patients (n = 162). Two-year MACE estimates were 4.2% (MHDhigh/LADlow), 7.6% (MHDhigh/LADhigh), 1.8% (MHDlow/LADlow), and 13.0% (MHDlow/LADhigh). Adjusting for pre-existing coronary heart disease and other prognostic factors, MHDhigh/LADlow (subdistribution hazard ratio SHR, 0.34; 95% CI, 0.13-0.93; P = .036) and MHDlow/LADlow (SHR, 0.24; 95% CI, 0.10-0.53; P < .001) were associated with a significantly reduced risk of MACE.
MHD is insufficient to predict LAD V15Gy with confidence. When MHD and LAD V15Gy dose exposure is discordant, isolated low LAD V15Gy significantly reduces the risk of MACE in patients with locally advanced NSCLC after radiation therapy, suggesting that the validity of whole heart metrics for optimally predicting cardiac events should be reassessed.
Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of ...detailed RT events from text to support clinical phenotyping.
A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.org was used and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: dose, fraction frequency, fraction number, date, treatment site, and boost. Named entity recognition models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multiclass RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction.
Named entity recognition models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for dose, fraction frequency, fraction number, date, treatment site, and boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on North American Association of Central Cancer Registries abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes.
We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the potential of natural language processing methods to support clinical care.
Large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study ...investigates ChatGPT family of models (GPT-3.5, GPT-4) in biomedical tasks beyond question-answering.
We evaluated model performance with 11 122 samples for two fundamental tasks in the biomedical domain-classification (n = 8676) and reasoning (n = 2446). The first task involves classifying health advice in scientific literature, while the second task is detecting causal relations in biomedical literature. We used 20% of the dataset for prompt development, including zero- and few-shot settings with and without chain-of-thought (CoT). We then evaluated the best prompts from each setting on the remaining dataset, comparing them to models using simple features (BoW with logistic regression) and fine-tuned BioBERT models.
Fine-tuning BioBERT produced the best classification (F1: 0.800-0.902) and reasoning (F1: 0.851) results. Among LLM approaches, few-shot CoT achieved the best classification (F1: 0.671-0.770) and reasoning (F1: 0.682) results, comparable to the BoW model (F1: 0.602-0.753 and 0.675 for classification and reasoning, respectively). It took 78 h to obtain the best LLM results, compared to 0.078 and 0.008 h for the top-performing BioBERT and BoW models, respectively.
The simple BoW model performed similarly to the most complex LLM prompting. Prompt engineering required significant investment.
Despite the excitement around viral ChatGPT, fine-tuning for two fundamental biomedical natural language processing tasks remained the best strategy.
Radiotherapy accelerates coronary heart disease (CHD), but the dose to critical cardiac substructures has not been systematically studied in lung cancer.
To examine independent cardiac substructure ...radiotherapy factors for major adverse cardiac events (MACE) and all-cause mortality in patients with locally advanced non-small cell lung cancer (NSCLC).
A retrospective cohort analysis of 701 patients with locally advanced NSCLC treated with thoracic radiotherapy at Harvard University-affiliated hospitals between December 1, 2003, and January 27, 2014, was performed. Data analysis was conducted between January 12, 2019, and July 22, 2020. Cardiac substructures were manually delineated. Radiotherapy dose parameters (mean, maximum, and the volume V, percentage receiving a specific Gray Gy dose in 5-Gy increments) were calculated. Receiver operating curve and cut-point analyses estimating MACE (unstable angina, heart failure hospitalization or urgent visit, myocardial infarction, coronary revascularization, and cardiac death) were performed. Fine and Gray and Cox regressions were adjusted for preexisting CHD and other prognostic factors.
MACE and all-cause mortality.
Of the 701 patients included in the analysis, 356 were men (50.8%). The median age was 65 years (interquartile range, 57-73 years). The optimal cut points for substructure and radiotherapy doses (highest C-index value) were left anterior descending (LAD) coronary artery V15 Gy greater than or equal to 10% (0.64), left circumflex coronary artery V15 Gy greater than or equal to 14% (0.64), left ventricle V15 Gy greater than or equal to 1% (0.64), and mean total coronary artery dose greater than or equal to 7 Gy (0.62). Adjusting for baseline CHD status and other prognostic factors, an LAD coronary artery V15 Gy greater than or equal to 10% was associated with increased risk of MACE (adjusted hazard ratio, 13.90; 95% CI, 1.23-157.21; P = .03) and all-cause mortality (adjusted hazard ratio, 1.58; 95% CI, 1.09-2.29; P = .02). Among patients without CHD, associations with increased 1-year MACE were noted for LAD coronary artery V15 Gy greater than or equal to 10% (4.9% vs 0%), left circumflex coronary artery V15 Gy greater than or equal to 14% (5.2% vs 0.7%), left ventricle V15 Gy greater than or equal to 1% (5.0% vs 0.4%), and mean total coronary artery dose greater than or equal to 7 Gy (4.8% vs 0%) (all P ≤ .001), but only a left ventricle V15 Gy greater than or equal to 1% increased the risk among patients with CHD (8.4% vs 4.1%; P = .046). Among patients without CHD, 2-year all-cause mortality was increased with an LAD coronary artery V15 Gy greater than or equal to 10% (51.2% vs 42.2%; P = .009) and mean total coronary artery dose greater than or equal to 7 Gy (53.2% vs 40.0%; P = .01).
The findings of this cohort study suggest that optimal cardiac dose constraints may differ based on preexisting CHD. Although the LAD coronary artery V15 Gy greater than or equal to 10% appeared to be an independent estimator of the probability of MACE and all-cause mortality, particularly in patients without CHD, left ventricle V15 Gy greater than or equal to 1% appeared to confer an increased risk of MACE among patients with CHD. These constraints are worthy of further study because there is a need for improved cardiac risk stratification and aggressive risk mitigation strategies.