Clinical oncology is experiencing rapid growth in data that are collected to enhance cancer care. With recent advances in the field of artificial intelligence (AI), there is now a computational basis ...to integrate and synthesize this growing body of multi-dimensional data, deduce patterns, and predict outcomes to improve shared patient and clinician decision making. While there is high potential, significant challenges remain. In this perspective, we propose a pathway of clinical cancer care touchpoints for narrow-task AI applications and review a selection of applications. We describe the challenges faced in the clinical translation of AI and propose solutions. We also suggest paths forward in weaving AI into individualized patient care, with an emphasis on clinical validity, utility, and usability. By illuminating these issues in the context of current AI applications for clinical oncology, we hope to help advance meaningful investigations that will ultimately translate to real-world clinical use.
Extranodal extension (ENE) is a well-established poor prognosticator and an indication for adjuvant treatment escalation in patients with head and neck squamous cell carcinoma (HNSCC). Identification ...of ENE on pretreatment imaging represents a diagnostic challenge that limits its clinical utility. We previously developed a deep learning algorithm that identifies ENE on pretreatment computed tomography (CT) imaging in patients with HNSCC. We sought to validate our algorithm performance for patients from a diverse set of institutions and compare its diagnostic ability to that of expert diagnosticians.
We obtained preoperative, contrast-enhanced CT scans and corresponding pathology results from two external data sets of patients with HNSCC: an external institution and The Cancer Genome Atlas (TCGA) HNSCC imaging data. Lymph nodes were segmented and annotated as ENE-positive or ENE-negative on the basis of pathologic confirmation. Deep learning algorithm performance was evaluated and compared directly to two board-certified neuroradiologists.
A total of 200 lymph nodes were examined in the external validation data sets. For lymph nodes from the external institution, the algorithm achieved an area under the receiver operating characteristic curve (AUC) of 0.84 (83.1% accuracy), outperforming radiologists' AUCs of 0.70 and 0.71 (
= .02 and
= .01). Similarly, for lymph nodes from the TCGA, the algorithm achieved an AUC of 0.90 (88.6% accuracy), outperforming radiologist AUCs of 0.60 and 0.82 (
< .0001 and
= .16). Radiologist diagnostic accuracy improved when receiving deep learning assistance.
Deep learning successfully identified ENE on pretreatment imaging across multiple institutions, exceeding the diagnostic ability of radiologists with specialized head and neck experience. Our findings suggest that deep learning has utility in the identification of ENE in patients with HNSCC and has the potential to be integrated into clinical decision making.
Identification of nodal metastasis and tumor extranodal extension (ENE) is crucial for head and neck cancer management, but currently only can be diagnosed via postoperative pathology. Pretreatment, ...radiographic identification of ENE, in particular, has proven extremely difficult for clinicians, but would be greatly influential in guiding patient management. Here, we show that a deep learning convolutional neural network can be trained to identify nodal metastasis and ENE with excellent performance that surpasses what human clinicians have historically achieved. We trained a 3-dimensional convolutional neural network using a dataset of 2,875 CT-segmented lymph node samples with correlating pathology labels, cross-validated and fine-tuned on 124 samples, and conducted testing on a blinded test set of 131 samples. On the blinded test set, the model predicted ENE and nodal metastasis each with area under the receiver operating characteristic curve (AUC) of 0.91 (95%CI: 0.85-0.97). The model has the potential for use as a clinical decision-making tool to help guide head and neck cancer patient management.
Purpose
To devise, validate, and externally test PET/CT radiomics signatures for human papillomavirus (HPV) association in primary tumors and metastatic cervical lymph nodes of oropharyngeal squamous ...cell carcinoma (OPSCC).
Methods
We analyzed 435 primary tumors (326 for training, 109 for validation) and 741 metastatic cervical lymph nodes (518 for training, 223 for validation) using FDG-PET and non-contrast CT from a multi-institutional and multi-national cohort. Utilizing 1037 radiomics features per imaging modality and per lesion, we trained, optimized, and independently validated machine-learning classifiers for prediction of HPV association in primary tumors, lymph nodes, and combined “virtual” volumes of interest (VOI). PET-based models were additionally validated in an external cohort.
Results
Single-modality PET and CT final models yielded similar classification performance without significant difference in independent validation; however, models combining PET and CT features outperformed single-modality PET- or CT-based models, with receiver operating characteristic area under the curve (AUC) of 0.78, and 0.77 for prediction of HPV association using primary tumor lesion features, in cross-validation and independent validation, respectively. In the external PET-only validation dataset, final models achieved an AUC of 0.83 for a virtual VOI combining primary tumor and lymph nodes, and an AUC of 0.73 for a virtual VOI combining all lymph nodes.
Conclusion
We found that PET-based radiomics signatures yielded similar classification performance to CT-based models, with potential added value from combining PET- and CT-based radiomics for prediction of HPV status. While our results are promising, radiomics signatures may not yet substitute tissue sampling for clinical decision-making.
Reply to A.B. Simon et al Kann, Benjamin H; Payabvash, Sam; Aneja, Sanjay
Journal of clinical oncology,
06/2020, Letnik:
38, Številka:
16
Journal Article
Abstract
Data about the quality of cancer information that chatbots and other artificial intelligence systems provide are limited. Here, we evaluate the accuracy of cancer information on ChatGPT ...compared with the National Cancer Institute’s (NCI’s) answers by using the questions on the “Common Cancer Myths and Misconceptions” web page. The NCI’s answers and ChatGPT answers to each question were blinded, and then evaluated for accuracy (accurate: yes vs no). Ratings were evaluated independently for each question, and then compared between the blinded NCI and ChatGPT answers. Additionally, word count and Flesch-Kincaid readability grade level for each individual response were evaluated. Following expert review, the percentage of overall agreement for accuracy was 100% for NCI answers and 96.9% for ChatGPT outputs for questions 1 through 13 (ĸ = ‒0.03, standard error = 0.08). There were few noticeable differences in the number of words or the readability of the answers from NCI or ChatGPT. Overall, the results suggest that ChatGPT provides accurate information about common cancer myths and misconceptions.
Accurate risk-stratification can facilitate precision therapy in oropharyngeal squamous cell carcinoma (OPSCC). We explored the potential added value of baseline positron emission tomography ...(PET)/computed tomography (CT) radiomic features for prognostication and risk stratification of OPSCC beyond the American Joint Committee on Cancer (AJCC) 8th edition staging scheme. Using institutional and publicly available datasets, we included OPSCC patients with known human papillomavirus (HPV) status, without baseline distant metastasis and treated with curative intent. We extracted 1037 PET and 1037 CT radiomic features quantifying lesion shape, imaging intensity, and texture patterns from primary tumors and metastatic cervical lymph nodes. Utilizing random forest algorithms, we devised novel machine-learning models for OPSCC progression-free survival (PFS) and overall survival (OS) using “radiomics” features, “AJCC” variables, and the “combined” set as input. We designed both single- (PET or CT) and combined-modality (PET/CT) models. Harrell’s C-index quantified survival model performance; risk stratification was evaluated in Kaplan–Meier analysis. A total of 311 patients were included. In HPV-associated OPSCC, the best “radiomics” model achieved an average C-index ± standard deviation of 0.62 ± 0.05 (p = 0.02) for PFS prediction, compared to 0.54 ± 0.06 (p = 0.32) utilizing “AJCC” variables. Radiomics-based risk-stratification of HPV-associated OPSCC was significant for PFS and OS. Similar trends were observed in HPV-negative OPSCC. In conclusion, radiomics imaging features extracted from pre-treatment PET/CT may provide complimentary information to the current AJCC staging scheme for survival prognostication and risk-stratification of HPV-associated OPSCC.
Manual segmentation of tumors and organs-at-risk (OAR) in 3D imaging for radiation-therapy planning is time-consuming and subject to variation between different observers. Artificial intelligence ...(AI) can assist with segmentation, but challenges exist in ensuring high-quality segmentation, especially for small, variable structures, such as the esophagus. We investigated the effect of variation in segmentation quality and style of physicians for training deep-learning models for esophagus segmentation and proposed a new metric, edge roughness, for evaluating/quantifying slice-to-slice inconsistency. This study includes a real-world cohort of 394 patients who each received radiation therapy (mainly for lung cancer). Segmentation of the esophagus was performed by 8 physicians as part of routine clinical care. We evaluated manual segmentation by comparing the length and edge roughness of segmentations among physicians to analyze inconsistencies. We trained eight multiple- and individual-physician segmentation models in total, based on U-Net architectures and residual backbones. We used the volumetric Dice coefficient to measure the performance for each model. We proposed a metric, edge roughness, to quantify the shift of segmentation among adjacent slices by calculating the curvature of edges of the 2D sagittal- and coronal-view projections. The auto-segmentation model trained on multiple physicians (MD1-7) achieved the highest mean Dice of 73.7 ± 14.8%. The individual-physician model (MD7) with the highest edge roughness (mean ± SD: 0.106 ± 0.016) demonstrated significantly lower volumetric Dice for test cases compared with other individual models (MD7: 58.5 ± 15.8%, MD6: 67.1 ± 16.8%, p < 0.001). A multiple-physician model trained after removing the MD7 data resulted in fewer outliers (e.g., Dice ≤ 40%: 4 cases for MD1-6, 7 cases for MD1-7, N
= 394). While we initially detected this pattern in a single clinician, we validated the edge roughness metric across the entire dataset. The model trained with the lowest-quantile edge roughness (MD
-Q1, N
= 62) achieved significantly higher Dice (N
= 270) than the model trained with the highest-quantile ones (MD
-Q4, N
= 62) (MD
-Q1: 67.8 ± 14.8%, MD
-Q4: 62.8 ± 15.7%, p < 0.001). This study demonstrates that there is significant variation in style and quality in manual segmentations in clinical care, and that training AI auto-segmentation algorithms from real-world, clinical datasets may result in unexpectedly under-performing algorithms with the inclusion of outliers. Importantly, this study provides a novel evaluation metric, edge roughness, to quantify physician variation in segmentation which will allow developers to filter clinical training data to optimize model performance.
Management of brain metastases typically includes radiotherapy (RT) with conventional fractionation and/or stereotactic radiosurgery (SRS). However, optimal indications and practice patterns for SRS ...remain unclear. We sought to evaluate national practice patterns for patients with metastatic disease receiving brain RT.
We queried the National Cancer Data Base (NCDB) for patients diagnosed with metastatic non-small cell lung cancer, breast cancer, colorectal cancer, or melanoma from 2004 to 2014 who received upfront brain RT. Patients were divided into SRS and non-SRS cohorts. Patient and facility-level SRS predictors were analyzed with chi-square tests and logistic regression, and uptake trends were approximated with linear regression. Survival by diagnosis year was analyzed with the Kaplan-Meier method.
Of 75,953 patients, 12,250 (16.1%) received SRS and 63,703 (83.9%) received non-SRS. From 2004 to 2014, the proportion of patients receiving SRS annually increased (from 9.8% to 25.6%;
<.001), and the proportion of facilities using SRS annually increased (from 31.2% to 50.4%;
<.001). On multivariable analysis, nonwhite race, nonprivate insurance, and residence in lower-income or less-educated regions predicted lower SRS use (
<.05 for each). During the study period, SRS use increased disproportionally among patients with private insurance or who resided in higher-income or higher-educated regions. From 2004 to 2013, 1-year actuarial survival improved from 24.1% to 49.6% for patients selected for SRS and from 21.0% to 26.3% for non-SRS patients (
<.001).
This NCDB analysis demonstrates steadily increasing-although modest overall-brain SRS use for patients with metastatic disease in the United States and identifies several progressively widening sociodemographic disparities in the adoption of SRS. Further research is needed to determine the reasons for these worsening disparities and their clinical implications on intracranial control, neurocognitive toxicities, quality of life, and survival for patients with brain metastases.