The aim of this study was to investigate the robustness and reproducibility of radiomic features in different magnetic resonance imaging sequences.
A phantom was scanned on a clinical 3 T system ...using fluid-attenuated inversion recovery (FLAIR), T1-weighted (T1w), and T2-weighted (T2w) sequences with low and high matrix size. For retest data, scans were repeated after repositioning of the phantom. Test and retest datasets were segmented using a semiautomated approach. Intraobserver and interobserver comparison was performed. Radiomic features were extracted after standardized preprocessing of images. Test-retest robustness was assessed using concordance correlation coefficients, dynamic range, and Bland-Altman analyses. Reproducibility was assessed by intraclass correlation coefficients.
The number of robust features (concordance correlation coefficient and dynamic range ≥ 0.90) was higher for features calculated from FLAIR than from T1w and T2w images. High-resolution FLAIR images provided the highest percentage of robust features (n = 37/45, 81%). No considerable difference in the number of robust features was observed between low- and high-resolution T1w and T2w images (T1w low: n = 26/45, 56%; T1w high: n = 25/45, 54%; T2 low: n = 21/45, 46%; T2 high: n = 24/45, 52%). A total of 15 (33%) of 45 features showed excellent robustness across all sequences and demonstrated excellent intraobserver and interobserver reproducibility (intraclass correlation coefficient ≥ 0.75).
FLAIR delivers the most robust substrate for radiomic analyses. Only 15 of 45 features showed excellent robustness and reproducibility across all sequences. Care must be taken in the interpretation of clinical studies using nonrobust features.
As the enthusiasm surrounding Deep Learning grows, both medical practitioners and regulatory bodies are exploring ways to safely introduce image segmentation in clinical practice. One frontier to ...overcome when translating promising research into the clinical open world is the shift from static to continual learning. Continual learning, the practice of training models throughout their lifecycle, is seeing growing interest but is still in its infancy in healthcare. We present Lifelong nnU-Net, a standardized framework that places continual segmentation at the hands of researchers and clinicians. Built on top of the nnU-Net-widely regarded as the best-performing segmenter for multiple medical applications-and equipped with all necessary modules for training and testing models sequentially, we ensure broad applicability and lower the barrier to evaluating new methods in a continual fashion. Our benchmark results across three medical segmentation use cases and five continual learning methods give a comprehensive outlook on the current state of the field and signify a first reproducible benchmark.
Abstract
Our purpose was to analyze the robustness and reproducibility of magnetic resonance imaging (MRI) radiomic features. We constructed a multi-object fruit phantom to perform MRI acquisition as ...scan-rescan using a 3 Tesla MRI scanner. We applied T2-weighted (T2w) half-Fourier acquisition single-shot turbo spin-echo (HASTE), T2w turbo spin-echo (TSE), T2w fluid-attenuated inversion recovery (FLAIR), T2 map and T1-weighted (T1w) TSE. Images were resampled to isotropic voxels. Fruits were segmented. The workflow was repeated by a second reader and the first reader after a pause of one month. We applied PyRadiomics to extract 107 radiomic features per fruit and sequence from seven feature classes. We calculated concordance correlation coefficients (CCC) and dynamic range (DR) to obtain measurements of feature robustness. Intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reproducibility. We calculated Gini scores to test the pairwise discriminative power specific for the features and MRI sequences. We depict Bland Altmann plots of features with top discriminative power (Mann–Whitney U test). Shape features were the most robust feature class. T2 map was the most robust imaging technique (robust features (rf), n = 84). HASTE sequence led to the least amount of rf (n = 20). Intra-observer ICC was excellent (≥ 0.75) for nearly all features (max–min; 99.1–97.2%). Deterioration of ICC values was seen in the inter-observer analyses (max–min; 88.7–81.1%). Complete robustness across all sequences was found for 8 features. Shape features and T2 map yielded the highest pairwise discriminative performance. Radiomics validity depends on the MRI sequence and feature class. T2 map seems to be the most promising imaging technique with the highest feature robustness, high intra-/inter-observer reproducibility and most promising discriminative power.
Various studies have shown that medical professionals are prone to follow the incorrect suggestions offered by algorithms, especially when they have limited inputs to interrogate and interpret such ...suggestions and when they have an attitude of relying on them. We examine the effect of correct and incorrect algorithmic suggestions on the diagnosis performance of radiologists when (1) they have no, partial, and extensive informational inputs for explaining the suggestions (study 1) and (2) they are primed to hold a positive, negative, ambivalent, or neutral attitude towards AI (study 2). Our analysis of 2760 decisions made by 92 radiologists conducting 15 mammography examinations shows that radiologists' diagnoses follow both incorrect and correct suggestions, despite variations in the explainability inputs and attitudinal priming interventions. We identify and explain various pathways through which radiologists navigate through the decision process and arrive at correct or incorrect decisions. Overall, the findings of both studies show the limited effect of using explainability inputs and attitudinal priming for overcoming the influence of (incorrect) algorithmic suggestions.
Objectives
To predict the main component of pure and mixed kidney stones using dual-energy computed tomography and machine learning.
Methods
200 kidney stones with a known composition as determined ...by infrared spectroscopy were examined using a non-anthropomorphic phantom on a spectral detector computed tomography scanner. Stones were of either pure (monocrystalline,
n
= 116) or compound (dicrystalline,
n
= 84) composition. Image acquisition was repeated twice using both, normal and low-dose protocols, respectively (ND/LD). Conventional images and low and high keV virtual monoenergetic images were reconstructed. Stones were semi-automatically segmented. A shallow neural network was trained using data from ND1 acquisition split into training (70%), testing (15%) and validation-datasets (15%). Performance for ND2 and both LD acquisitions was tested. Accuracy on a per-voxel and a per-stone basis was calculated.
Results
Main components were: Whewellite (
n
= 80), weddellite (
n
= 21), Ca-phosphate (
n
= 39), cysteine (
n
= 20), struvite (
n
= 13), uric acid (
n
= 18) and xanthine stones (
n
= 9). Stone size ranged from 3 to 18 mm. Overall accuracy for predicting the main component on a per-voxel basis attained by ND testing dataset was 91.1%. On independently tested acquisitions, accuracy was 87.1–90.4%.
Conclusions
Even in compound stones, the main component can be reliably determined using dual energy CT and machine learning, irrespective of dose protocol.
Key Points
• Spectral Detector Dual Energy CT and Machine Learning allow for an accurate prediction of stone composition.
• Ex-vivo study demonstrates the dose independent assessment of pure and compound stones.
• Lowest accuracy is reported for compound stones with struvite as main component.
Tools for medical image analysis have been developed to reduce the time needed to detect abnormalities and to provide more accurate results. Particularly, tools based on artificial intelligence and ...machine learning techniques have led to significant improvements in medical imaging interpretation in the last decade. Automatic evaluation of acute ischemic stroke in medical imaging is one of the fields that witnessed a major development. Commercially available products so far aim to identify (and quantify) the ischemic core, the ischemic penumbra, the site of arterial occlusion and the collateral flow but they are not (yet) intended as standalone diagnostic tools. Their use can be complementary; they are intended to support physicians' interpretation of medical images and hence standardise selection of patients for acute treatment. This review provides an introduction into the field of computer-aided diagnosis and focuses on the automatic analysis of non-contrast-enhanced computed tomography, computed tomography angiography and perfusion imaging. Future studies are necessary that allow the evaluation and comparison of different imaging strategies and post-processing algorithms during the diagnosis process in patients with suspected acute ischemic stroke; which may further facilitate the standardisation of treatment and stroke management.
Objectives
The goal of the present study was to classify the most common types of plain radiographs using a neural network and to validate the network’s performance on internal and external data. ...Such a network could help improve various radiological workflows.
Methods
All radiographs from the year 2017 (
n
= 71,274) acquired at our institution were retrieved from the PACS. The 30 largest categories (
n
= 58,219, 81.7% of all radiographs performed in 2017) were used to develop and validate a neural network (MobileNet v1.0) using transfer learning. Image categories were extracted from DICOM metadata (study and image description) and mapped to the WHO manual of diagnostic imaging. As an independent, external validation set, we used images from other institutions that had been stored in our PACS (
n
= 5324).
Results
In the internal validation, the overall accuracy of the model was 90.3% (95%CI: 89.2–91.3%), whereas, for the external validation set, the overall accuracy was 94.0% (95%CI: 93.3–94.6%).
Conclusions
Using data from one single institution, we were able to classify the most common categories of radiographs with a neural network. The network showed good generalizability on the external validation set and could be used to automatically organize a PACS, preselect radiographs so that they can be routed to more specialized networks for abnormality detection or help with other parts of the radiological workflow (e.g., automated hanging protocols; check if ordered image and performed image are the same). The final AI algorithm is publicly available for evaluation and extension.
Key Points
• Data from one single institution can be used to train a neural network for the correct detection of the 30 most common categories of plain radiographs.
• The trained model achieved a high accuracy for the majority of categories and showed good generalizability to images from other institutions.
• The neural network is made publicly available and can be used to automatically organize a PACS or to preselect radiographs so that they can be routed to more specialized neural networks for abnormality detection.
To evaluate the association between the coronavirus disease 2019 (COVID-19) and post-inflammatory emphysematous lung alterations on follow-up low-dose CT scans.
Consecutive patients with proven ...COVID-19 infection and a follow-up CT were retrospectively reviewed. The severity of pulmonary involvement was classified as mild, moderate and severe. Total lung volume, emphysema volume and the ratio of emphysema/-to-lung volume were quantified semi-automatically and compared inter-individually between initial and follow-up CT and to a control group of healthy, age- and sex-matched patients. Lung density was further assessed by drawing circular regions of interest (ROIs) into non-affected regions of the upper lobes.
A total of 32 individuals (mean age: 64 ± 13 years, 12 females) with at least one follow-up CT (mean: 52 ± 66 days, range: 5-259) were included. In the overall cohort, total lung volume, emphysema volume and the ratio of lung-to-emphysema volume did not differ significantly between the initial and follow-up scans. In the subgroup of COVID-19 patients with > 30 days of follow-up, the emphysema volume was significantly larger as compared to the subgroup with a follow-up < 30 days (p = 0.045). Manually measured single ROIs generally yielded lower attenuation values prior to COVID-19 pneumonia, but the difference was not significant between groups (all p > 0.05).
COVID-19 patients with a follow-up CT >30 days showed significant emphysematous lung alterations. These findings may help to explain the long-term effect of COVID-19 on pulmonary function and warrant validation by further studies.