Machine learning (ML) is a branch of artificial intelligence centered on algorithms which do not need explicit prior programming to function but automatically learn from available data, creating ...decision models to complete tasks. ML-based tools have numerous promising applications in several fields of medicine. Its use has grown following the increased availability of patient data due to technological advances such as digital health records and high-volume information extraction from medical images. Multiple ML algorithms have been proposed for applications in oncology. For instance, they have been employed for oncological risk assessment, automated segmentation, lesion detection, characterization, grading and staging, prediction of prognosis and therapy response.
In the near future, ML could become essential part of every step of oncological screening strategies and patients’ management thus leading to precision medicine.
•ML is a branch of AI that has numerous future applications in Oncology.•ML helps oncological risk assessment and screening.•ML empowers lesion detection and characterization, grading and staging.•ML may predict prognosis and therapy response.
Key Points
• Interest in radiomics and machine learning is steadily increasing and is reflected both in research output and number of commercially available solutions.
• Currently available ...commercial products using machine learning are often supported by limited evidence of clinical usefulness and studies are often of low methodological quality.
• Ethical and regulatory issues remain open and hinder implementation of machine learning software packages in daily clinical practice.
Purpose
To systematically review and evaluate the methodological quality of studies using radiomics for diagnostic and predictive purposes in patients with intracranial meningioma. To perform a ...meta-analysis of machine learning studies for the prediction of intracranial meningioma grading from pre-operative brain MRI.
Methods
Articles published from the year 2000 on radiomics and machine learning applications in brain imaging of meningioma patients were included. Their methodological quality was assessed by three readers with the radiomics quality score, using the intra-class correlation coefficient (ICC) to evaluate inter-reader reproducibility. A meta-analysis of machine learning studies for the preoperative evaluation of meningioma grading was performed and their risk of bias was assessed with the Quality Assessment of Diagnostic Accuracy Studies tool.
Results
In all, 23 studies were included in the systematic review, 8 of which were suitable for the meta-analysis. Total (possible range, −8 to 36) and percentage radiomics quality scores were respectively 6.96 ± 4.86 and 19 ± 13% with a moderate to good inter-reader reproducibility (ICC = 0.75, 95% confidence intervals, 95%CI = 0.54–0.88). The meta-analysis showed an overall AUC of 0.88 (95%CI = 0.84–0.93) with a standard error of 0.02.
Conclusions
Machine learning and radiomics have been proposed for multiple applications in the imaging of meningiomas, with promising results for preoperative lesion grading. However, future studies with adequate standardization and higher methodological quality are required prior to their introduction in clinical practice.
Objective
To systematically review and evaluate the methodological quality of studies using magnetic resonance imaging (MRI) and computed tomography (CT) radiomics for cardiac applications.
Methods
...Multiple medical literature archives (PubMed, Web of Science, and EMBASE) were systematically searched to retrieve original studies focused on cardiac MRI and CT radiomics applications. Two researchers in consensus assessed each investigation using the radiomics quality score (RQS). Subgroup analyses were performed to assess whether the total RQS varied according to study aim, journal quartile, imaging modality, and first author category.
Results
From a total of 1961 items, 53 articles were finally included in the analysis. Overall, the studies reached a median total RQS of 7 (IQR, 4–12), corresponding to a percentage score of 19.4% (IQR, 11.1–33.3%). Item scores were particularly low due to lack of prospective design, cost-effectiveness analysis, and open science. Median RQS percentage score was significantly higher in papers where the first author was a medical doctor and in those published on first quartile journals.
Conclusions
The overall methodological quality of radiomics studies in cardiac MRI and CT is still lacking. A higher degree of standardization of the radiomics workflow and higher publication standards for studies are required to allow its translation into clinical practice.
Key Points
• RQS has been recently proposed for the overall assessment of the methodological quality of radiomics-based studies.
• The 53 included studies on cardiac MRI and CT radiomics applications reached a median total RQS of 7 (IQR, 4–12), corresponding to a percentage of 19.4% (IQR, 11.1–33.3%).
• A more standardized methodology in the radiomics workflow is needed, especially in terms of study design, validation, and open science, in order to translate the results to clinical applications.
Objectives
The aim of this study was to systematically review the literature and perform a meta-analysis of machine learning (ML) diagnostic accuracy studies focused on clinically significant ...prostate cancer (csPCa) identification on MRI.
Methods
Multiple medical databases were systematically searched for studies on ML applications in csPCa identification up to July 31, 2019. Two reviewers screened all papers independently for eligibility. The area under the receiver operating characteristic curves (AUC) was pooled to quantify predictive accuracy. A random-effects model estimated overall effect size while statistical heterogeneity was assessed with the
I
2
value. A funnel plot was used to investigate publication bias. Subgroup analyses were performed based on reference standard (biopsy or radical prostatectomy) and ML type (deep and non-deep).
Results
After the final revision, 12 studies were included in the analysis. Statistical heterogeneity was high both in overall and in subgroup analyses. The overall pooled AUC for ML in csPCa identification was 0.86, with 0.81–0.91 95% confidence intervals (95%CI). The biopsy subgroup (
n
= 9) had a pooled AUC of 0.85 (95%CI = 0.79–0.91) while the radical prostatectomy one (
n
= 3) of 0.88 (95%CI = 0.76–0.99). Deep learning ML (
n
= 4) had a 0.78 AUC (95%CI = 0.69–0.86) while the remaining 8 had AUC = 0.90 (95%CI = 0.85–0.94).
Conclusions
ML pipelines using prostate MRI to identify csPCa showed good accuracy and should be further investigated, possibly with better standardisation in design and reporting of results.
Key Points
• Overall pooled AUC was 0.86 with 0.81–0.91 95% confidence intervals.
• In the reference standard subgroup analysis, algorithm accuracy was similar with pooled AUCs of 0.85 (0.79–0.91 95% confidence intervals) and 0.88 (0.76–0.99 95% confidence intervals) for studies employing biopsies and radical prostatectomy, respectively.
• Deep learning pipelines performed worse (AUC = 0.78, 0.69–0.86 95% confidence intervals) than other approaches (AUC = 0.90, 0.85–0.94 95% confidence intervals).
Purpose
Pituitary macroadenoma consistency can influence the ease of lesion removal during surgery, especially when using a transsphenoidal approach. Unfortunately, it is not assessable on standard ...qualitative MRI. Radiomic texture analysis could help in extracting mineable quantitative tissue characteristics. We aimed to assess the accuracy of texture analysis combined with machine learning in the preoperative evaluation of pituitary macroadenoma consistency in patients undergoing endoscopic endonasal surgery.
Methods
Data of 89 patients (68 soft and 21 fibrous macroadenomas) who underwent MRI and transsphenoidal surgery at our institution were retrospectively reviewed. After manual segmentation, radiomic texture features were extracted from original and filtered MR images. Feature stability analysis and a multistep feature selection were performed. After oversampling to balance the classes, 80% of the data was used for hyperparameter tuning via stratified 5-fold cross-validation, while a 20% hold-out set was employed for its final testing, using an Extra Trees ensemble meta-algorithm. The reference standard was based on surgical findings.
Results
A total of 1118 texture features were extracted, of which 741 were stable. After removal of low variance (
n
= 4) and highly intercorrelated (
n
= 625) parameters, recursive feature elimination identified a subset of 14 features. After hyperparameter tuning, the Extra Trees classifier obtained an accuracy of 93%, sensitivity of 100%, and specificity of 87%. The area under the receiver operating characteristic and precision-recall curves was 0.99.
Conclusion
Preoperative T2-weighted MRI texture analysis and machine learning could predict pituitary macroadenoma consistency.
Background
Prostate volume, as determined by magnetic resonance imaging (MRI), is a useful biomarker both for distinguishing between benign and malignant pathology and can be used either alone or ...combined with other parameters such as prostate‐specific antigen.
Purpose
This study compared different deep learning methods for whole‐gland and zonal prostate segmentation.
Study Type
Retrospective.
Population
A total of 204 patients (train/test = 99/105) from the PROSTATEx public dataset.
Field strength/Sequence
A 3 T, TSE T2‐weighted.
Assessment
Four operators performed manual segmentation of the whole‐gland, central zone + anterior stroma + transition zone (TZ), and peripheral zone (PZ). U‐net, efficient neural network (ENet), and efficient residual factorized ConvNet (ERFNet) were trained and tuned on the training data through 5‐fold cross‐validation to segment the whole gland and TZ separately, while PZ automated masks were obtained by the subtraction of the first two.
Statistical Tests
Networks were evaluated on the test set using various accuracy metrics, including the Dice similarity coefficient (DSC). Model DSC was compared in both the training and test sets using the analysis of variance test (ANOVA) and post hoc tests. Parameter number, disk size, training, and inference times determined network computational complexity and were also used to assess the model performance differences. A P < 0.05 was selected to indicate the statistical significance.
Results
The best DSC (P < 0.05) in the test set was achieved by ENet: 91% ± 4% for the whole gland, 87% ± 5% for the TZ, and 71% ± 8% for the PZ. U‐net and ERFNet obtained, respectively, 88% ± 6% and 87% ± 6% for the whole gland, 86% ± 7% and 84% ± 7% for the TZ, and 70% ± 8% and 65 ± 8% for the PZ. Training and inference time were lowest for ENet.
Data Conclusion
Deep learning networks can accurately segment the prostate using T2‐weighted images.
Evidence Level
4
Technical Efficacy
Stage 2
We performed a meta-analysis to compare the diagnostic performance of conventional SPECT (C-SPECT) and cadmium-zinc-telluride (CZT)-SPECT systems in detecting angiographically proven coronary artery ...disease (CAD).
Studies published between January 2000 and February 2018 were identified by database search. We included studies assessing C-SPECT or CZT-SPECT as a diagnostic test to evaluate patients for the presence of CAD, defined as at least 50% diameter stenosis on invasive coronary angiography. A study was eligible regardless of whether patients were referred for suspected or known CAD.
We identified 40 eligible articles (25 C-SPECT and 15 CZT-SPECT studies) including 7334 patients (4997 in C-SPECT and 2337 in CZT-SPECT studies). The pooled sensitivity and specificity were 85% and 66% for C-SPECT and 89% and 69% for CZT-SPECT imaging studies. The area under the curve was slightly higher for CZT-SPECT (0.89) compared to C-SPECT (0.83); accordingly, the summary diagnostic OR was 17 for CZT-SPECT and 11 for C-SPECT. The accuracy of the two tests slightly differs between C-SPECT and CZT-SPECT (chi-square 11.28, P < .05). At meta-regression analysis, no significant association between both sensitivity and specificity and demographical and clinical variables considered was found for C-SPECT and CZT-SPECT studies.
C-SPECT and CZT-SPECT have good diagnostic performance in detecting angiographic proven CAD, with a slightly higher accuracy for CZT-SPECT. This result supports the use of the novel gamma cameras in clinical routine practices also considering the improvements in acquisition time and radiation exposure reduction.
Even though radiomics can hold great potential for supporting clinical decision-making, its current use is mostly limited to academic research, without applications in routine clinical practice. The ...workflow of radiomics is complex due to several methodological steps and nuances, which often leads to inadequate reporting and evaluation, and poor reproducibility. Available reporting guidelines and checklists for artificial intelligence and predictive modeling include relevant good practices, but they are not tailored to radiomic research. There is a clear need for a complete radiomics checklist for study planning, manuscript writing, and evaluation during the review process to facilitate the repeatability and reproducibility of studies. We here present a documentation standard for radiomic research that can guide authors and reviewers. Our motivation is to improve the quality and reliability and, in turn, the reproducibility of radiomic research. We name the checklist CLEAR (CheckList for EvaluAtion of Radiomics research), to convey the idea of being more transparent. With its 58 items, the CLEAR checklist should be considered a standardization tool providing the minimum requirements for presenting clinical radiomics research. In addition to a dynamic online version of the checklist, a public repository has also been set up to allow the radiomics community to comment on the checklist items and adapt the checklist for future versions. Prepared and revised by an international group of experts using a modified Delphi method, we hope the CLEAR checklist will serve well as a single and complete scientific documentation tool for authors and reviewers to improve the radiomics literature.
Key points
The workflow of radiomics is complex with several methodological steps and nuances, which often leads to inadequate reproducibility, reporting, and evaluation.
The CLEAR checklist proposes a single documentation standard for radiomics research that can guide authors, providing the minimum requirements for presenting clinical radiomics research.
The CLEAR checklist aims to include all necessary items to support reviewer evaluation of radiomics-related manuscripts.
Objectives
We aimed to assess the performance of radiomics and machine learning (ML) for classification of non-cystic benign and malignant breast lesions on ultrasound images, compare ML’s accuracy ...with that of a breast radiologist, and verify if the radiologist’s performance is improved by using ML.
Methods
Our retrospective study included patients from two institutions. A total of 135 lesions from Institution 1 were used to train and test the ML model with cross-validation. Radiomic features were extracted from manually annotated images and underwent a multistep feature selection process. Not reproducible, low variance, and highly intercorrelated features were removed from the dataset. Then, 66 lesions from Institution 2 were used as an external test set for ML and to assess the performance of a radiologist without and with the aid of ML, using McNemar’s test.
Results
After feature selection, 10 of the 520 features extracted were employed to train a random forest algorithm. Its accuracy in the training set was 82% (standard deviation, SD, ± 6%), with an AUC of 0.90 (SD ± 0.06), while the performance on the test set was 82% (95% confidence intervals (CI) = 70–90%) with an AUC of 0.82 (95% CI = 0.70–0.93). It resulted in being significantly better than the baseline reference (
p
= 0.0098), but not different from the radiologist (79.4%,
p
= 0.815). The radiologist’s performance improved when using ML (80.2%), but not significantly (
p
= 0.508).
Conclusions
A radiomic analysis combined with ML showed promising results to differentiate benign from malignant breast lesions on ultrasound images.
Key Points
• Machine learning showed good accuracy in discriminating benign from malignant breast lesions
• The machine learning classifier’s performance was comparable to that of a breast radiologist
• The radiologist’s accuracy improved with machine learning, but not significantly