Objectives
To assess Prostate Imaging Reporting and Data System (PI-RADS)–trained deep learning (DL) algorithm performance and to investigate the effect of data size and prior knowledge on the ...detection of clinically significant prostate cancer (csPCa) in biopsy-naïve men with a suspicion of PCa.
Methods
Multi-institution data included 2734 consecutive biopsy-naïve men with elevated PSA levels (≥ 3 ng/mL) that underwent multi-parametric MRI (mpMRI). mpMRI exams were prospectively reported using PI-RADS v2 by expert radiologists. A DL framework was designed and trained on center 1 data (
n
= 1952) to predict PI-RADS ≥ 4 (
n
= 1092) lesions from bi-parametric MRI (bpMRI). Experiments included varying the number of cases and the use of automatic zonal segmentation as a DL prior. Independent center 2 cases (
n
= 296) that included pathology outcome (systematic and MRI targeted biopsy) were used to compute performance for radiologists and DL. The performance of detecting PI-RADS 4–5 and Gleason > 6 lesions was assessed on 782 unseen cases (486 center 1, 296 center 2) using free-response ROC (FROC) and ROC analysis.
Results
The DL sensitivity for detecting PI-RADS ≥ 4 lesions was 87% (193/223, 95% CI: 82–91) at an average of 1 false positive (FP) per patient, and an AUC of 0.88 (95% CI: 0.84–0.91). The DL sensitivity for the detection of Gleason > 6 lesions was 85% (79/93, 95% CI: 77–83) @ 1 FP compared to 91% (85/93, 95% CI: 84–96) @ 0.3 FP for a consensus panel of expert radiologists. Data size and prior zonal knowledge significantly affected performance (4%,
p
<
0.05
).
Conclusion
PI-RADS-trained DL can accurately detect and localize Gleason > 6 lesions. DL could reach expert performance using substantially more than 2000 training cases, and DL zonal segmentation.
Key Points
•
AI for prostate MRI analysis depends strongly on data size and prior zonal knowledge.
•
AI needs substantially more than 2000 training cases to achieve expert performance.
Objectives
Multiparametric MRI has high diagnostic accuracy for detecting prostate cancer, but non-invasive prediction of tumor grade remains challenging. Characterizing tumor perfusion by exploiting ...the fractal nature of vascular anatomy might elucidate the aggressive potential of a tumor. This study introduces the concept of fractal analysis for characterizing prostate cancer perfusion and reports about its usefulness for non-invasive prediction of tumor grade.
Methods
We retrospectively analyzed the openly available PROSTATEx dataset with 112 cancer foci in 99 patients. In all patients, histological grading groups specified by the International Society of Urological Pathology (ISUP) were obtained from in-bore MRI-guided biopsy. Fractal analysis of dynamic contrast-enhanced perfusion MRI sequences was performed, yielding fractal dimension (FD) as quantitative descriptor. Two-class and multiclass diagnostic accuracy was analyzed using area under the curve (AUC) receiver operating characteristic analysis, and optimal FD cutoffs were established. Additionally, we compared fractal analysis to conventional apparent diffusion coefficient (ADC) measurements.
Results
Fractal analysis of perfusion allowed accurate differentiation of non-significant (group 1) and clinically significant (groups 2–5) cancer with a sensitivity of 91% (confidence interval CI: 83–96%) and a specificity of 86% (CI: 73–94%). FD correlated linearly with ISUP groups (
r
2
= 0.874,
p
< 0.001). Significant groupwise differences were obtained between low, intermediate, and high ISUP group 1–4 (
p
≤ 0.001) but not group 5 tumors. Fractal analysis of perfusion was significantly more reliable than ADC in predicting non-significant and clinically significant cancer (AUC
FD
= 0.97 versus AUC
ADC
= 0.77,
p
< 0.001).
Conclusion
Fractal analysis of perfusion MRI accurately predicts prostate cancer grading in low-, intermediate-, and high-, but not highest-grade, tumors.
Key Points
• In 112 prostate carcinomas, fractal analysis of MR perfusion imaging accurately differentiated low-, intermediate-, and high-grade cancer (ISUP grade groups 1–4)
.
• Fractal analysis detected clinically significant prostate cancer with a sensitivity of 91% (83–96%) and a specificity of 86% (73–94%)
.
• Fractal dimension of perfusion at the tumor margin may provide an imaging biomarker to predict prostate cancer grading
.
Display omitted
•Multi-center, multi-vendor, multi-protocol prostate MRI dataset was made available for evaluation of segmentation algorithms.•Evaluated 11 substantially different segmentation ...algorithms with respect to algorithm performance on multi-center data.•Algorithms were evaluated relative to human observers.•Challenge results show that segmentation of prostate MRI images is not a solved issue.
Prostate MRI image segmentation has been an area of intense research due to the increased use of MRI as a modality for the clinical workup of prostate cancer. Segmentation is useful for various tasks, e.g. to accurately localize prostate boundaries for radiotherapy or to initialize multi-modal registration algorithms. In the past, it has been difficult for research groups to evaluate prostate segmentation algorithms on multi-center, multi-vendor and multi-protocol data. Especially because we are dealing with MR images, image appearance, resolution and the presence of artifacts are affected by differences in scanners and/or protocols, which in turn can have a large influence on algorithm accuracy. The Prostate MR Image Segmentation (PROMISE12) challenge was setup to allow a fair and meaningful comparison of segmentation methods on the basis of performance and robustness. In this work we will discuss the initial results of the online PROMISE12 challenge, and the results obtained in the live challenge workshop hosted by the MICCAI2012 conference. In the challenge, 100 prostate MR cases from 4 different centers were included, with differences in scanner manufacturer, field strength and protocol. A total of 11 teams from academic research groups and industry participated. Algorithms showed a wide variety in methods and implementation, including active appearance models, atlas registration and level sets. Evaluation was performed using boundary and volume based metrics which were combined into a single score relating the metrics to human expert performance. The winners of the challenge where the algorithms by teams Imorphics and ScrAutoProstate, with scores of 85.72 and 84.29 overall. Both algorithms where significantly better than all other algorithms in the challenge (p<0.05) and had an efficient implementation with a run time of 8min and 3s per case respectively. Overall, active appearance model based approaches seemed to outperform other approaches like multi-atlas registration, both on accuracy and computation time. Although average algorithm performance was good to excellent and the Imorphics algorithm outperformed the second observer on average, we showed that algorithm combination might lead to further improvement, indicating that optimal performance for prostate segmentation is not yet obtained. All results are available online at http://promise12.grand-challenge.org/.
Artificial intelligence developments are essential to the successful deployment of community-wide, MRI-driven prostate cancer diagnosis. AI systems should ensure that the main benefits of biopsy ...avoidance are delivered while maintaining consistent high specificities, at a range of disease prevalences. Since all current artificial intelligence / computer-aided detection systems for prostate cancer detection are experimental, multiple developmental efforts are still needed to bring the vision to fruition. Initial work needs to focus on developing systems as diagnostic supporting aids so their results can be integrated into the radiologists’ workflow including gland and target outlining tasks for fusion biopsies. Developing AI systems as clinical decision-making tools will require greater efforts. The latter encompass larger multicentric, multivendor datasets where the different needs of patients stratified by diagnostic settings, disease prevalence, patient preference, and clinical setting are considered. AI-based, robust, standard operating procedures will increase the confidence of patients and payers, thus enabling the wider adoption of the MRI-directed approach for prostate cancer diagnosis.
Key Points
• AI systems need to ensure that the benefits of biopsy avoidance are delivered with consistent high specificities, at a range of disease prevalence.
• Initial work has focused on developing systems as diagnostic supporting aids for outlining tasks, so they can be integrated into the radiologists’ workflow to support MRI-directed biopsies.
• Decision support tools require a larger body of work including multicentric, multivendor studies where the clinical needs, disease prevalence, patient preferences, and clinical setting are additionally defined.
•Image quality is important in the MRI-pathway for the diagnosis of prostate cancer.•Automated image quality assessment can aid in safeguarding the acquisition quality.•Radiomics can be the basis for ...an automated MRI image quality method.
The guidelines for prostate cancer recommend the use of MRI in the prostate cancer pathway. Due to the variability in prostate MR image quality, the reliability of this technique in the detection of prostate cancer is highly variable in clinical practice. This leads to the need for an objective and automated assessment of image quality to ensure an adequate acquisition and hereby to improve the reliability of MRI. The aim of this study is to investigate the feasibility of Blind/referenceless image spatial quality evaluator (Brisque) and radiomics in automated image quality assessment of T2-weighted (T2W) images.
Anonymized axial T2W images from 140 patients were scored for quality using a five-point Likert scale (low, suboptimal, acceptable, good, very good quality) in consensus by two readers. Images were dichotomized into clinically acceptable (very good, good and acceptable quality images) and clinically unacceptable (low and suboptimal quality images) in order to train and verify the model. Radiomics and Brisque features were extracted from a central cuboid volume including the prostate. A reduced feature set was used to fit a Linear Discriminant Analysis (LDA) model to predict image quality. Two hundred times repeated 5-fold cross-validation was used to train the model and test performance by assessing the classification accuracy, the discrimination accuracy as receiver operating curve - area under curve (ROC-AUC), and by generating confusion matrices.
Thirty-four images were classified as clinically unacceptable and 106 were classified as clinically acceptable. The accuracy of the independent test set (mean ± standard deviation) was 85.4 ± 5.5%. The ROC-AUC was 0.856 (0.851 – 0.861) (mean; 95% confidence interval).
Radiomics AI can automatically detect a significant portion of T2W images of suboptimal image quality. This can help improve image quality at the time of acquisition, thus reducing repeat scans and improving diagnostic accuracy.
This review presents the current state of the art regarding multiparametric magnetic resonance (MR) imaging of prostate cancer. Technical requirements and clinical indications for the use of ...multiparametric MR imaging in detection, localization, characterization, staging, biopsy guidance, and active surveillance of prostate cancer are discussed. Although reported accuracies of the separate and combined multiparametric MR imaging techniques vary for diverse clinical prostate cancer indications, multiparametric MR imaging of the prostate has shown promising results and may be of additional value in prostate cancer localization and local staging. Consensus on which technical approaches (field strengths, sequences, use of an endorectal coil) and combination of multiparametric MR imaging techniques should be used for specific clinical indications remains a challenge. Because guidelines are currently lacking, suggestions for a general minimal protocol for multiparametric MR imaging of the prostate based on the literature and the authors' experience are presented. Computer programs that allow evaluation of the various components of a multiparametric MR imaging examination in one view should be developed. In this way, an integrated interpretation of anatomic and functional MR imaging techniques in a multiparametric MR imaging examination is possible. Education and experience of specialist radiologists are essential for correct interpretation of multiparametric prostate MR imaging findings. Supportive techniques, such as computer-aided diagnosis are needed to obtain a fast, cost-effective, easy, and more reproducible prostate cancer diagnosis out of more and more complex multiparametric MR imaging data.
Objectives
To create a radiomics approach based on multiparametric magnetic resonance imaging (mpMRI) features extracted from an auto-fixed volume of interest (VOI) that quantifies the phenotype of ...clinically significant (CS) peripheral zone (PZ) prostate cancer (PCa).
Methods
This study included 206 patients with 262 prospectively called mpMRI prostate imaging reporting and data system 3–5 PZ lesions. Gleason scores > 6 were defined as CS PCa. Features were extracted with an auto-fixed 12-mm spherical VOI placed around a pin point in each lesion. The value of dynamic contrast-enhanced imaging(DCE), multivariate feature selection and extreme gradient boosting (XGB) vs. univariate feature selection and random forest (RF), expert-based feature pre-selection, and the addition of image filters was investigated using the training (171 lesions) and test (91 lesions) datasets.
Results
The best model with features from T2-weighted (T2-w) + diffusion-weighted imaging (DWI) + DCE had an area under the curve (AUC) of 0.870 (95% CI 0.980–0.754). Removal of DCE features decreased AUC to 0.816 (95% CI 0.920–0.710), although not significantly (
p
= 0.119). Multivariate and XGB outperformed univariate and RF (
p
= 0.028). Expert-based feature pre-selection and image filters had no significant contribution.
Conclusions
The phenotype of CS PZ PCa lesions can be quantified using a radiomics approach based on features extracted from T2-w + DWI using an auto-fixed VOI. Although DCE features improve diagnostic performance, this is not statistically significant. Multivariate feature selection and XGB should be preferred over univariate feature selection and RF. The developed model may be a valuable addition to traditional visual assessment in diagnosing CS PZ PCa.
Key Points
• T2-weighted and diffusion-weighted imaging features are essential components of a radiomics model for clinically significant prostate cancer; addition of dynamic contrast-enhanced imaging does not significantly improve diagnostic performance.
• Multivariate feature selection and extreme gradient outperform univariate feature selection and random forest.
• The developed radiomics model that extracts multiparametric MRI features with an auto-fixed volume of interest may be a valuable addition to visual assessment in diagnosing clinically significant prostate cancer.
Early detection improves prognosis in pancreatic ductal adenocarcinoma (PDAC), but is challenging as lesions are often small and poorly defined on contrast-enhanced computed tomography scans (CE-CT). ...Deep learning can facilitate PDAC diagnosis; however, current models still fail to identify small (<2 cm) lesions. In this study, state-of-the-art deep learning models were used to develop an automatic framework for PDAC detection, focusing on small lesions. Additionally, the impact of integrating the surrounding anatomy was investigated. CE-CT scans from a cohort of 119 pathology-proven PDAC patients and a cohort of 123 patients without PDAC were used to train a
for automatic lesion detection and segmentation (
. Two additional
were trained to investigate the impact of anatomy integration: (1) segmenting the pancreas and tumor (
), and (2) segmenting the pancreas, tumor, and multiple surrounding anatomical structures (
). An external, publicly available test set was used to compare the performance of the three networks. The
achieved the best performance, with an area under the receiver operating characteristic curve of 0.91 for the whole test set and 0.88 for tumors <2 cm, showing that state-of-the-art deep learning can detect small PDAC and benefits from anatomy information.
To determine the effect of computer-aided diagnosis (CAD) on less-experienced and experienced observer performance in differentiation of benign from malignant prostate lesions at 3-T multiparametric ...magnetic resonance (MR) imaging.
The institutional review board waived the need for informed consent. Retrospectively, 34 patients were included who had prostate cancer and had undergone multiparametric MR imaging, including T2-weighted, diffusion-weighted, and dynamic contrast material-enhanced MR imaging prior to radical prostatectomy. Six radiologists less experienced in prostate imaging and four radiologists experienced in prostate imaging were asked to characterize different regions suspicious for cancer as benign or malignant on multiparametric MR images first without and subsequently with CAD software. The effect of CAD was analyzed by using a multiple-reader, multicase, receiver operating characteristic analysis and a linear mixed-model analysis.
In 34 patients, 206 preannotated regions, including 67 malignant and 64 benign regions in the peripheral zone (PZ) and 19 malignant and 56 benign regions in the transition zone (TZ), were evaluated. Stand-alone CAD had an overall area under the receiver operating characteristic curve (AUC) of 0.90. For PZ and TZ lesions, the AUCs were 0.92 and 0.87, respectively. Without CAD, less-experienced observers had an overall AUC of 0.81, which significantly increased to 0.91 (P = .001) with CAD. For experienced observers, the AUC without CAD was 0.88, which increased to 0.91 (P = .17) with CAD. For PZ lesions, less-experienced observers increased their AUC from 0.86 to 0.95 (P < .001) with CAD. Experienced observers showed an increase from 0.91 to 0.93 (P = .13). For TZ lesions, less-experienced observers significantly increased their performance from 0.72 to 0.79 (P = .01) with CAD and experienced observers increased their performance from 0.81 to 0.82 (P = .42).
Addition of CAD significantly improved the performance of less-experienced observers in distinguishing benign from malignant lesions; when less-experienced observers used CAD, they reached similar performance as experienced observers. The stand-alone performance of CAD was similar to performance of experienced observers.
•A sample size calculation for segmentation accuracy studies is derived.•Parameters include accuracy difference, algorithm disagreement and a design factor.•A formula is derived to account for errors ...in the study reference standard.•A case study illustrates the application of the theory to a segmentation study design.
Display omitted
Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources.
In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards.
The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study.