Background
Accurate prostate zonal segmentation on magnetic resonance images (MRI) is a critical prerequisite for automated prostate cancer detection. We aimed to assess the variability of manual ...prostate zonal segmentation by radiologists on T2-weighted (T2W) images, and to study factors that may influence it.
Methods
Seven radiologists of varying levels of experience segmented the whole prostate gland (WG) and the transition zone (TZ) on 40 axial T2W prostate MRI images (3D T2W images for all patients, and both 3D and 2D images for a subgroup of 12 patients). Segmentation variabilities were evaluated based on: anatomical and morphological variation of the prostate (volume, retro-urethral lobe, intensity contrast between zones, presence of a PI-RADS ≥ 3 lesion), variation in image acquisition (3D vs 2D T2W images), and reader’s experience. Several metrics including Dice Score (DSC) and Hausdorff Distance were used to evaluate differences, with both a pairwise and a consensus (STAPLE reference) comparison.
Results
DSC was 0.92 (± 0.02) and 0.94 (± 0.03) for WG, 0.88 (± 0.05) and 0.91 (± 0.05) for TZ respectively with pairwise comparison and consensus reference. Variability was significantly (
p
< 0.05) lower for the mid-gland (DSC 0.95 (± 0.02)), higher for the apex (0.90 (± 0.06)) and the base (0.87 (± 0.06)), and higher for smaller prostates (
p
< 0.001) and when contrast between zones was low (
p
< 0.05). Impact of the other studied factors was non-significant.
Conclusions
Variability is higher in the extreme parts of the gland, is influenced by changes in prostate morphology (volume, zone intensity ratio), and is relatively unaffected by the radiologist’s level of expertise.
Objectives
Accurate zonal segmentation of prostate boundaries on MRI is a critical prerequisite for automated prostate cancer detection based on PI-RADS. Many articles have been published describing ...deep learning methods offering great promise for fast and accurate segmentation of prostate zonal anatomy. The objective of this review was to provide a detailed analysis and comparison of applicability and efficiency of the published methods for automatic segmentation of prostate zonal anatomy by systematically reviewing the current literature.
Methods
A Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) was conducted until June 30, 2021, using PubMed, ScienceDirect, Web of Science and EMBase databases. Risk of bias and applicability based on Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria adjusted with Checklist for Artificial Intelligence in Medical Imaging (CLAIM) were assessed.
Results
A total of 458 articles were identified, and 33 were included and reviewed. Only 2 articles had a low risk of bias for all four QUADAS-2 domains. In the remaining, insufficient details about database constitution and segmentation protocol provided sources of bias (inclusion criteria, MRI acquisition, ground truth). Eighteen different types of terminology for prostate zone segmentation were found, while 4 anatomic zones are described on MRI. Only 2 authors used a blinded reading, and 4 assessed inter-observer variability.
Conclusions
Our review identified numerous methodological flaws and underlined biases precluding us from performing quantitative analysis for this review. This implies low robustness and low applicability in clinical practice of the evaluated methods. Actually, there is not yet consensus on quality criteria for database constitution and zonal segmentation methodology.
Key points
Several limitations exist with current methods of automatic prostate segmentation.
There is wide variability of databases, imaging tasks and assessment criteria.
No preferred methodology for prostate zonal segmentation methodology and terminology was used.
Vast majority of papers share common methodological flaws discussed in this review.
•Association of 3D and axial 2D T2w sequences could combine the advantage of both sequences.•2D T2w sequence was associated with a better delineation of some anatomical structures.•PI-QUAL score ...allows an analysis that is easily carried out in clinical routine and reproducible.
Comparing 2D and 3D T2 weighted sequences in terms of image quality in 3.0 T MRI with readers of varied experiences, using PI-QUAL inspired criteria.
91 male patients with suspected prostate cancer (PCa) underwent diagnostic prostate MRI on a 3.0 T MR system using a 32-channel phased-array torso coil before prostate biopsy. MRI protocol included 3D T2w images, axial 2D T2w images, axial diffusion-weighted images (DWI) with the corresponding ADC apparent diffusion coefficient maps, and axial dynamic contrast enhanced images. 3D T2w and 2D T2w imaging were compared by 4 radiologists using a Likert scale for image quality (overall anatomy, delineation of capsule, seminal vesicles, ejaculatory ducts, sphincter muscle, artifacts), tumor delimitation and conspicuity.
No significant differences in terms of overall quality between 3D and 2D T2w images were found. However 2D T2w demonstrated higher rating than 3D T2w images as for the image quality of the external capsule, sphincter muscle and ejaculatory ducts delineation (p < 0.05).
3D T2w sequence can’t replace 2D T2w sequence, despite good quality images but it remains more prone to artifacts. Quality of 2D T2w sequences was substantially superior to 3D sequences for delineation of key structures as external capsule, sphincter muscle. The use of PI-QUAL criteria allows reproducible analysis of the quality of T2 weighted images.
•Robust probabilistic framework to estimate a consensus from several continuous segmentation maps.•Replacement of the classical Gaussian model by heavy-tailed distributions allowing the raters ...performances to be locally estimated.•Definition of bias and spatial priors allowing the raters bias to be properly assessed and the smoothness of the consensus map to be controlled.•Introduction of the concept of mixture of consensuses allowing not only one but potentially several consensuses to be obtained.•Estimation of the model performances on several human expert and neural network segmentations of prostate and lung nodule images and comparison with state of the art algorithms.
Display omitted
The fusion of probability maps is required when trying to analyse a collection of image labels or probability maps produced by several segmentation algorithms or human raters. The challenge is to weight the combination of maps correctly, in order to reflect the agreement among raters, the presence of outliers and the spatial uncertainty in the consensus. In this paper, we address several shortcomings of prior work in continuous label fusion. We introduce a novel approach to jointly estimate a reliable consensus map and to assess the presence of outliers and the confidence in each rater. Our robust approach is based on heavy-tailed distributions allowing local estimates of raters performances. In particular, we investigate the Laplace, the Student’s t and the generalized double Pareto distributions, and compare them with respect to the classical Gaussian likelihood used in prior works. We unify these distributions into a common tractable inference scheme based on variational calculus and scale mixture representations. Moreover, the introduction of bias and spatial priors leads to proper rater bias estimates and control over the smoothness of the consensus map. Finally, we propose an approach that clusters raters based on variational boosting, and thus may produce several alternative consensus maps. Our approach was successfully tested on MR prostate delineations and on lung nodule segmentations from the LIDC-IDRI dataset.
A reliable estimation of prostate volume (PV) is essential to prostate cancer management. The objective of our multi-rater study was to compare intra- and inter-rater variability of PV from manual ...planimetry and ellipsoid formulas.
Objective
A reliable estimation of prostate volume (PV) is essential to prostate cancer management. The objective of our multi-rater study was to compare intra- and inter-rater variability of PV from ...manual planimetry and ellipsoid formulas.
Methods
Forty treatment-naive patients who underwent prostate MRI were selected from a local database. PV and corresponding PSA density (PSAd) were estimated on 3D T2-weighted MRI (3 T) by 7 independent radiologists using the traditional ellipsoid formula (TEF), the newer biproximate ellipsoid formula (BPEF), and the manual planimetry method (MPM) used as ground truth. Intra- and inter-rater variability was calculated using the mixed model–based intraclass correlation coefficient (ICC).
Results
Mean volumes were 67.00 (± 36.61), 66.07 (± 35.03), and 64.77 (± 38.27) cm
3
with the TEF, BPEF, and MPM methods, respectively. Both TEF and BPEF overestimated PV relative to MPM, with the former presenting significant differences (+ 1.91 cm
3
, IQ = − 0.33 cm
3
, 5.07 cm
3
,
p
val = 0.03). Both intra- (ICC > 0.90) and inter-rater (ICC > 0.90) reproducibility were excellent. MPM had the highest inter-rater reproducibility (ICC = 0.999). Inter-rater PV variation led to discrepancies in classification according to the clinical criterion of PSAd > 0.15 ng/mL for 2 patients (5%), 7 patients (17.5%), and 9 patients (22.5%) when using MPM, TEF, and BPEF, respectively.
Conclusion
PV measurements using ellipsoid formulas and MPM are highly reproducible. MPM is a robust method for PV assessment and PSAd calculation, with the lowest variability. TEF showed a high degree of concordance with MPM but a slight overestimation of PV. Precise anatomic landmarks as defined with the BPEF led to a more accurate PV estimation, but also to a higher variability.
Key Points
• Manual planimetry used for prostate volume estimation is robust and reproducible, with the lowest variability between readers.
• Ellipsoid formulas are accurate and reproducible but with higher variability between readers.
• The traditional ellipsoid formula tends to overestimate prostate volume.
Purpose: An accurate zonal segmentation of the prostate is required for prostate cancer (PCa) management with MRI.
Approach: The aim of this work is to present UFNet, a deep learning-based method for ...automatic zonal segmentation of the prostate from T2-weighted (T2w) MRI. It takes into account the image anisotropy, includes both spatial and channelwise attention mechanisms and uses loss functions to enforce prostate partition. The method was applied on a private multicentric three-dimensional T2w MRI dataset and on the public two-dimensional T2w MRI dataset ProstateX. To assess the model performance, the structures segmented by the algorithm on the private dataset were compared with those obtained by seven radiologists of various experience levels.
Results: On the private dataset, we obtained a Dice score (DSC) of 93.90 ± 2.85 for the whole gland (WG), 91.00 ± 4.34 for the transition zone (TZ), and 79.08 ± 7.08 for the peripheral zone (PZ). Results were significantly better than other compared networks’ (p-value < 0.05). On ProstateX, we obtained a DSC of 90.90 ± 2.94 for WG, 86.84 ± 4.33 for TZ, and 78.40 ± 7.31 for PZ. These results are similar to state-of-the art results and, on the private dataset, are coherent with those obtained by radiologists. Zonal locations and sectorial positions of lesions annotated by radiologists were also preserved.
Conclusions: Deep learning-based methods can provide an accurate zonal segmentation of the prostate leading to a consistent zonal location and sectorial position of lesions, and therefore can be used as a helping tool for PCa diagnosis.
The purpose of this study was to investigate the relationship between inter-reader variability in manual prostate contour segmentation on magnetic resonance imaging (MRI) examinations and determine ...the optimal number of readers required to establish a reliable reference standard.
Seven radiologists with various experiences independently performed manual segmentation of the prostate contour (whole-gland WG and transition zone TZ) on 40 prostate MRI examinations obtained in 40 patients. Inter-reader variability in prostate contour delineations was estimated using standard metrics (Dice similarity coefficient DSC, Hausdorff distance and volume-based metrics). The impact of the number of readers (from two to seven) on segmentation variability was assessed using pairwise metrics (consistency) and metrics with respect to a reference segmentation (conformity), obtained either with majority voting or simultaneous truth and performance level estimation (STAPLE) algorithm.
The average segmentation DSC for two readers in pairwise comparison was 0.919 for WG and 0.876 for TZ. Variability decreased with the number of readers: the interquartile ranges of the DSC were 0.076 (WG) / 0.021 (TZ) for configurations with two readers, 0.005 (WG) / 0.012 (TZ) for configurations with three readers, and 0.002 (WG) / 0.0037 (TZ) for configurations with six readers. The interquartile range decreased slightly faster between two and three readers than between three and six readers. When using consensus methods, variability often reached its minimum with three readers (with STAPLE, DSC = 0.96 range: 0.945-0.971 for WG and DSC = 0.94 range: 0.912-0.957 for TZ, and interquartile range was minimal for configurations with three readers.
The number of readers affects the inter-reader variability, in terms of inter-reader consistency and conformity to a reference. Variability is minimal for three readers, or three readers represent a tipping point in the variability evolution, with both pairwise-based metrics or metrics with respect to a reference. Accordingly, three readers may represent an optimal number to determine references for artificial intelligence applications.