Machine learning (ML) is emerging as a feasible approach to optimize patients' care path in Radiation Oncology. Applications include autosegmentation, treatment planning optimization, and prediction ...of oncological and toxicity outcomes. The purpose of this clinically oriented systematic review is to illustrate the potential and limitations of the most commonly used ML models in solving everyday clinical issues in head and neck cancer (HNC) radiotherapy (RT).
Electronic databases were screened up to May 2021. Studies dealing with ML and radiomics were considered eligible. The quality of the included studies was rated by an adapted version of the qualitative checklist originally developed by Luo et al. All statistical analyses were performed using R version 3.6.1.
Forty-eight studies (21 on autosegmentation, four on treatment planning, 12 on oncological outcome prediction, 10 on toxicity prediction, and one on determinants of postoperative RT) were included in the analysis. The most common imaging modality was computed tomography (CT) (40%) followed by magnetic resonance (MR) (10%). Quantitative image features were considered in nine studies (19%). No significant differences were identified in global and methodological scores when works were stratified per their task (i.e., autosegmentation).
The range of possible applications of ML in the field of HN Radiation Oncology is wide, albeit this area of research is relatively young. Overall, if not safe yet, ML is most probably a bet worth making.
This review provides a formal overview of current automatic segmentation studies that use deep learning in radiotherapy. It covers 807 published papers and includes multiple cancer sites, image types ...(CT/MRI/PET), and segmentation methods. We collect key statistics about the papers to uncover commonalities, trends, and methods, and identify areas where more research might be needed. Moreover, we analyzed the corpus by posing explicit questions aimed at providing high-quality and actionable insights, including: “What should researchers think about when starting a segmentation study?”, “How can research practices in medical image segmentation be improved?”, “What is missing from the current corpus?”, and more. This allowed us to provide practical guidelines on how to conduct a good segmentation study in today’s competitive environment that will be useful for future research within the field, regardless of the specific radiotherapeutic subfield. To aid in our analysis, we used the large language model ChatGPT to condense information.
Contouring of anatomical regions is a crucial step in the medical workflow and is both time-consuming and prone to intra- and inter-observer variability. This study compares different strategies for ...automatic segmentation of the prostate in T2-weighted MRIs.
This study included 100 patients diagnosed with prostate adenocarcinoma who had undergone multi-parametric MRI and prostatectomy. From the T2-weighted MR images, ground truth segmentation masks were established by consensus from two expert radiologists. The prostate was then automatically contoured with six different methods: (1) a multi-atlas algorithm, (2) a proprietary algorithm in the Syngo.Via medical imaging software, and four deep learning models: (3) a V-net trained from scratch, (4) a pre-trained 2D U-net, (5) a GAN extension of the 2D U-net, and (6) a segmentation-adapted EfficientDet architecture. The resulting segmentations were compared and scored against the ground truth masks with one 70/30 and one 50/50 train/test data split. We also analyzed the association between segmentation performance and clinical variables.
The best performing method was the adapted EfficientDet (model 6), achieving a mean Dice coefficient of 0.914, a mean absolute volume difference of 5.9%, a mean surface distance (MSD) of 1.93 pixels, and a mean 95th percentile Hausdorff distance of 3.77 pixels. The deep learning models were less prone to serious errors (0.854 minimum Dice and 4.02 maximum MSD), and no significant relationship was found between segmentation performance and clinical variables.
Deep learning-based segmentation techniques can consistently achieve Dice coefficients of 0.9 or above with as few as 50 training patients, regardless of architectural archetype. The atlas-based and Syngo.via methods found in commercial clinical software performed significantly worse (0.855Formula: see text0.887 Dice).
To assess whether CT-based radiomics and blood-derived biomarkers could improve the prediction of overall survival (OS) and locoregional progression-free survival (LRPFS) in patients with ...oropharyngeal cancer (OPC) treated with curative-intent RT.
Consecutive OPC patients with primary tumors treated between 2005 and 2021 were included. Analyzed clinical variables included gender, age, smoking history, staging, subsite, HPV status, and blood parameters (baseline hemoglobin levels, neutrophils, monocytes, and platelets, and derived measurements). Radiomic features were extracted from the gross tumor volumes (GTVs) of the primary tumor using pyradiomics. Outcomes of interest were LRPFS and OS. Following feature selection, a radiomic score (RS) was calculated for each patient. Significant variables, along with age and gender, were included in multivariable analysis, and models were retained if statistically significant. The models' performance was compared by the C-index.
One hundred and five patients, predominately male (71%), were included in the analysis. The median age was 59 (IQR: 52-66) years, and stage IVA was the most represented (70%). HPV status was positive in 63 patients, negative in 7, and missing in 35 patients. The median OS follow-up was 6.3 (IQR: 5.5-7.9) years. A statistically significant association between low Hb levels and poorer LRPFS in the HPV-positive subgroup (
= 0.038) was identified. The calculation of the RS successfully stratified patients according to both OS (log-rank
< 0.0001) and LRPFS (log-rank
= 0.0002). The C-index of the clinical and radiomic model resulted in 0.82 CI: 0.80-0.84 for OS and 0.77 CI: 0.75-0.79 for LRPFS.
Our results show that radiomics could provide clinically significant informative content in this scenario. The best performances were obtained by combining clinical and quantitative imaging variables, thus suggesting the potential of integrative modeling for outcome predictions in this setting of patients.
Objective
Deploying an automatic segmentation model in practice should require rigorous quality assurance (QA) and continuous monitoring of the model’s use and performance, particularly in ...high-stakes scenarios such as healthcare. Currently, however, tools to assist with QA for such models are not available to AI researchers. In this work, we build a deep learning model that estimates the quality of automatically generated contours.
Methods
The model was trained to predict the segmentation quality by outputting an estimate of the Dice similarity coefficient given an image contour pair as input. Our dataset contained 60 axial T2-weighted MRI images of prostates with ground truth segmentations along with 80 automatically generated segmentation masks. The model we used was a 3D version of the EfficientDet architecture with a custom regression head. For validation, we used a fivefold cross-validation. To counteract the limitation of the small dataset, we used an extensive data augmentation scheme capable of producing virtually infinite training samples from a single ground truth label mask. In addition, we compared the results against a baseline model that only uses clinical variables for its predictions.
Results
Our model achieved a mean absolute error of 0
.
020 ± 0
.
026 (2.2% mean percentage error) in estimating the Dice score, with a rank correlation of 0.42. Furthermore, the model managed to correctly identify incorrect segmentations (defined in terms of acceptable/unacceptable) 99.6% of the time.
Conclusion
We believe that the trained model can be used alongside automatic segmentation tools to ensure quality and thus allow intervention to prevent undesired segmentation behavior.
When researchers are faced with building machine learning (ML) radiomic models, the first choice they have to make is what model to use. Naturally, the goal is to use the model with the best ...performance. But what is the best model? It is well known in ML that modern techniques such as gradient boosting and deep learning have better capacity than traditional models to solve complex problems in high dimensions. Despite this, most radiomics researchers still do not focus on these models in their research. As access to high-quality and large data sets increase, these high-capacity ML models may become even more relevant. In this article, we use a large dataset of 949 prostate cancer patients to compare the performance of a few of the most promising ML models for tabular data: gradient-boosted decision trees (GBDTs), multilayer perceptions, convolutional neural networks, and transformers. To this end, we predict nine different prostate cancer pathology outcomes of clinical interest. Our goal is to give a rough overview of how these models compare against one another in a typical radiomics setting. We also investigate if multitask learning improves the performance of these models when multiple targets are available. Our results suggest that GBDTs perform well across all targets, and that multitask learning does not provide a consistent improvement.
•Machine learning models are trained to predict pathological prostate cancer variables.•Four model types are compared: gradient boosting, MLPs, transformers, and CNNs.•Multitask learning is compared against regular training for all models.•Gradient boosting with CatBoost outperforms the deep learning models.•Multitask training only improved the MLP and CNN models.
To test the ability of high-performance machine learning (ML) models employing clinical, radiological, and radiomic variables to improve non-invasive prediction of the pathological status of prostate ...cancer (PCa) in a large, single-institution cohort.
Patients who underwent multiparametric MRI and prostatectomy in our institution in 2015-2018 were considered; a total of 949 patients were included. Gradient-boosted decision tree models were separately trained using clinical features alone and in combination with radiological reporting and/or prostate radiomic features to predict pathological T, pathological N, ISUP score, and their change from preclinical assessment. Model behavior was analyzed in terms of performance, feature importance, Shapley additive explanation (SHAP) values, and mean absolute error (MAE). The best model was compared against a naïve model mimicking clinical workflow.
The model including all variables was the best performing (AUC values ranging from 0.73 to 0.96 for the six endpoints). Radiomic features brought a small yet measurable boost in performance, with the SHAP values indicating that their contribution can be critical to successful prediction of endpoints for individual patients. MAEs were lower for low-risk patients, suggesting that the models find them easier to classify. The best model outperformed (p ≤ 0.0001) clinical baseline, resulting in significantly fewer false negative predictions and overall was less prone to under-staging.
Our results highlight the potential benefit of integrative ML models for pathological status prediction in PCa. Additional studies regarding clinical integration of such models can provide valuable information for personalizing therapy offering a tool to improve non-invasive prediction of pathological status.
The best machine learning model was less prone to under-staging of the disease. The improved accuracy of our pathological prediction models could constitute an asset to the clinical workflow by providing clinicians with accurate pathological predictions prior to treatment.
• Currently, the most common strategies for pre-surgical stratification of prostate cancer (PCa) patients have shown to have suboptimal performances. • The addition of radiological features to the clinical features gave a considerable boost in model performance. Our best model outperforms the naïve model, avoiding under-staging and resulting in a critical advantage in the clinic. •Machine learning models incorporating clinical, radiological, and radiomics features significantly improved accuracy of pathological prediction in prostate cancer, possibly constituting an asset to the clinical workflow.
Background
Radiomics represents an emerging field of precision‐medicine. Its application in head and neck is still at the beginning.
Methods
Retrospective study about magnetic resonance imaging (MRI) ...based radiomics in oral tongue squamous cell carcinoma (OTSCC) surgically treated (2010–2019; 79 patients). All preoperative MRIs include different sequences (T1, T2, DWI, ADC). Tumor volume was manually segmented and exported to radiomic‐software, to perform feature extraction. Statistically significant variables were included in multivariable analysis and related to survival endpoints. Predictive models were elaborated (clinical, radiomic, clinical‐radiomic models) and compared using C‐index.
Results
In almost all clinical‐radiomic models radiomic‐score maintained statistical significance. In all cases C‐index was higher in clinical‐radiomic models than in clinical ones. ADC provided the best fit to the models (C‐index 0.98, 0.86, 0.84 in loco‐regional recurrence, cause‐specific mortality, overall survival, respectively).
Conclusion
MRI‐based radiomics in OTSCC represents a promising noninvasive method of precision medicine, improving prognosis prediction before surgery.
Objectives
Radiomic involves testing the associations of a large number of quantitative imaging features with clinical characteristics. Our aim was to extract a radiomic signature from axial ...T2-weighted (T2-W) magnetic resonance imaging (MRI) of the whole prostate able to predict oncological and radiological scores in prostate cancer (PCa).
Methods
This study included 65 patients with localized PCa treated with radiotherapy (RT) between 2014 and 2018. For each patient, the T2-W MRI images were normalized with the histogram intensity scale standardization method. Features were extracted with the IBEX software. The association of each radiomic feature with risk class, T-stage, Gleason score (GS), extracapsular extension (ECE) score, and Prostate Imaging Reporting and Data System (PI-RADS v2) score was assessed by univariate and multivariate analysis.
Results
Forty-nine out of 65 patients were eligible. Among the 1702 features extracted, 3 to 6 features with the highest predictive power were selected for each outcome. This analysis showed that texture features were the most predictive for GS, PI-RADS v2 score, and risk class; intensity features were highly associated with T-stage, ECE score, and risk class, with areas under the receiver operating characteristic curve (ROC AUC) ranging from 0.74 to 0.94.
Conclusions
MRI-based radiomics is a promising tool for prediction of PCa characteristics. Although a significant association was found between the selected features and all the mentioned clinical/radiological scores, further validations on larger cohorts are needed before these findings can be applied in the clinical practice.
Key Points
• A radiomic model was used to classify PCa aggressiveness.
• Radiomic analysis was performed on T2-W magnetic resonance images of the whole prostate gland.
• The most predictive features belong to the texture (57%) and intensity (43%) domains.
Display omitted
•Using different platforms for radiomic extraction affects models’ performance.•Variables’ relevance is inconsistent among platforms.•MRI features are correlated to radiosurgery ...response in brain metastases from NSCLC.•Higher number of radiomic features does not necessarily imply better performance.
Radiomics enables the mining of quantitative features from medical images. The influence of the radiomic feature extraction software on the final performance of models is still a poorly understood topic. This study aimed to investigate the ability of radiomic features extracted by two different radiomic platforms to predict clinical outcomes in patients treated with radiosurgery for brain metastases from non-small cell lung cancer. We developed models integrating pre-treatment magnetic resonance imaging (MRI)-derived radiomic features and clinical data.
Pre-radiotherapy gadolinium enhanced axial T1-weighted MRI scans were used. MRI images were re-sampled, intensity-shifted, and histogram-matched before radiomic extraction by means of two different platforms (PyRadiomics and SOPHiA Radiomics). We adopted LASSO Cox regression models for multivariable analyses by creating radiomic, clinical, and combined models using three survival clinical endpoints (local control, distant progression, and overall survival). The statistical analysis was repeated 50 times with different random seeds and the median concordance index was used as performance metric of the models.
We analysed 276 metastases from 148 patients. The use of the two platforms resulted in differences in both the quality and the number of extractable features. That led to mismatches in terms of end-to-end performance, statistical significance of radiomic scores, and clinical covariates found significant in combined models.
This study shed new light on how extracting radiomic features from the same images using two different platforms could yield several discrepancies. That may lead to acute consequences on drawing conclusions, comparing results across the literature, and translating radiomics into clinical practice.