Purpose
Despite its increasing application, radiomics has not yet demonstrated a solid reliability, due to the difficulty in replicating analyses. The extraction of radiomic features from clinical ...MRI (T1w/T2w) presents even more challenges because of the absence of well‐defined units (e.g. HU). Some preprocessing steps are required before the estimation of radiomic features and one of this is the intensity normalization, that can be performed using different methods. The aim of this work was to evaluate the effect of three different normalization techniques, applied on T2w‐MRI images of the pelvic region, on radiomic features reproducibility.
Methods
T2w‐MRI acquired before (MRI1) and 12 months after radiotherapy (MRI2) from 14 patients treated for prostate cancer were considered. Four different conditions were analyzed: (a) the original MRI (No_Norm); (b) MRI normalized by the mean image value (Norm_Mean); (c) MRI normalized by the mean value of the urine in the bladder (Norm_ROI); (d) MRI normalized by the histogram‐matching method (Norm_HM). Ninety‐one radiomic features were extracted from three organs of interest (prostate, internal obturator muscles and bulb) at both time‐points and on each image discretized using a fixed bin‐width approach and the difference between the two time‐points was calculated (Δfeature). To estimate the effect of normalization methods on the reproducibility of radiomic features, ICC was calculated in three analyses: (a) considering the features extracted on MRI2 in the four conditions together and considering the influence of each method separately, with respect to No_Norm; (b) considering the features extracted on MRI2 in the four conditions with respect to the inter‐observer variability in region of interest (ROI) contouring, considering also the effect of the discretization approach; (c) considering Δfeature to evaluate if some indices can recover some consistency when differences are calculated.
Results
Nearly 60% of the features have shown poor reproducibility (ICC < 0.5) on MRI2 and the method that most affected features reliability was Norm_ROI (average ICC of 0.45). The other two methods were similar, except for first‐order features, where Norm_HM outperformed Norm_Mean (average ICC = 0.33 and 0.76 for Norm_Mean and Norm_HM, respectively). In the inter‐observer setting, the number of reproducible features varied in the three structures, being higher in the prostate than in the penile bulb and in the obturators. The analysis on Δfeature highlighted that more than 60% of the features were not consistent with respect to the normalization method and confirmed the high reproducibility of the features between Norm_Mean and Norm_HM, whereas Norm_ROI was the less reproducible method.
Conclusions
The normalization process impacts the reproducibility of radiomic features, both in terms of changes in the image information content and in the inter‐observer setting. Among the considered methods, Norm_Mean and Norm_HM seem to provide the most reproducible features with respect to the original image and also between themselves, whereas Norm_ROI generates less reproducible features. Only a very small subset of feature remained reproducible and independent in any tested condition, regardless the ROI and the adopted algorithm: skewness or kurtosis, correlation and one among Imc2, Idmn and Idn from GLCM group.
•Mean ADC values on 1.5 and 3T in various brain tumors do not differ significantly.•Mean ADC values on different field strengths are reliable for clinical follow-up.•Fixed imaging parameters are ...important for comparison on different field strengths.
Gradient and coil systems, pulse sequence design, and imaging parameters, as well as different scanners, can influence apparent diffusion coefficient (ADC) values. The aim of this study was to evaluate the effect of two different field strengths on the reproducibility of mean absolute ADC measurements in various primary and secondary brain tumors.
Fifty patients with histologically proven brain tumors were prospectively examined on two MR scanners from the same vendor, with different field strengths—1.5T and 3T—on the same day. Absolute ADC values were compared using the Wilcoxon matched-pairs signed-rank test. Inter-scanner agreement between two different fields in the same tumor was examined using correlation coefficients, and the discrepancy between the highest and the lowest mean absolute ADC values between scanners was tested using a one-way analysis of variance. Statistical significance was set at p < 0.05.
There was no statistically significant difference between mean absolute ADC values obtained on 1.5T and 3T scanners for all patients and all brain tumor types. The intratumoral difference in ADC values, averaged from two scanners in the same tumor type, ranged from 1.58 to 4.5% for 1.5T, and from 1.18 to 4.37% for 3T.Inter-scanner agreement was high, and the kappa coefficient ranged from 0.88 to 0.99, with no significant difference between obtained values on different field strengths.
Based on the results obtained in our study, there is no significant difference between mean absolute ADC values measured in various primary and secondary brain tumors at different field strengths (1.5 and 3.0T MR systems), in the same patient, and in the same tumor, measured on the same day.
Abstract Purpose To evaluate reproducibility and variations in apparent diffusion coefficient (ADC) measurement in normal pancreatic parenchyma at 1.5- and 3.0-Tesla and determine if differences may ...exist between the four pancreatic segments. Materials and methods Diffusion-weighted MR imaging of the pancreas was performed at 1.5-Tesla in 20 patients and at 3.0-Tesla in other 20 patients strictly matched for gender and age using the same b values (0, 400 and 800 s/mm2 ). Two independent observers placed regions of interest within the four pancreatic segments to measure ADC at both fields. Intra- and inter-observer agreement in ADC measurement was assessed using Bland-Altman analysis and comparison between ADC values obtained at both fields using non-parametrical tests. Results There were no significant differences in ADC between repeated measurements and between ADC obtained at 1.5-Tesla and those at 3.0-Tesla. The 95% limits of intra-observer agreement between ADC were 2.3%–22.7% at 1.5-Tesla and 1%–24.2% at 3.0-Tesla and those for inter-observer agreement between 1.9%–14% at 1.5-Tesla and 8%–25% at 3.0-Tesla. ADC values were similar in all pancreatic segments at 3.0-T whereas the tail had lower ADC at 1.5-Tesla. Conclusion ADC measurement conveys high degrees of intra- and inter-observer reproducibility. ADC have homogeneous distribution among the four pancreatic segments at 3.0-Tesla.
Abstract
Background
The use of mass spectrometry to investigate disease-associated proteins among thousands of candidates simultaneously creates challenges with the evaluation of operational and ...biological variation. Traditional statistical methods, which evaluate reproducibility of a single feature, are likely to provide an inadequate assessment of reproducibility. This paper proposes a systematic approach for the evaluation of the global reproducibility of multidimensional mass spectral data at the post-identification stage.
Methods
The proposed systematic approach combines dimensional reduction and permutation to test and summarize the reproducibility. First, principal component analysis is applied to the mean quantities from identified features of paired replicated samples. An eigenvalue test is used to identify the number of significant principal components which reflect the underlying correlation pattern of the multiple features. Second, a simulation-based permutation test is applied to the derived paired principal components. Third, a modified form of Bland Altman or MA plot is produced to visualize agreement between the replicates. Last, a discordance index is used to summarize the agreement.
Results
Application of this method to data from both a cardiac liquid chromatography tandem mass spectrometry experiment with iTRAQ labeling and simulation experiments derived from an ovarian cancer SELDI-MS experiment demonstrate that the proposed global reproducibility test is sensitive to the simulated systematic bias when the sample size is above 15. The two proposed test statistics (max
t
statistics and a sign score statistic) for the permutation tests are shown to be reliable.
Conclusion
The methodology presented in this paper provides a systematic approach for the global measurement of reproducibility in clinical proteomic studies.
Measurement of reproducibility White, Benjamin; Saltz, Eli
Psychological bulletin,
03/1957, Volume:
54, Issue:
2
Journal Article
Peer reviewed
"Reproducibility" refers to "the degree to which one can reproduce a subject's entire response pattern from a knowledge of his total score and the order of difficulty of the items." The purpose of ...this article is to examine some of the available techniques for assessing "reproducibility," to evaluate these techniques, and to indicate the relationships between these techniques and the concept of reliability. 20 references.
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ