Objectives
To evaluate the performance of an AI-powered algorithm for the automatic detection of pulmonary embolism (PE) on chest computed tomography pulmonary angiograms (CTPAs) on a large dataset.
...Methods
We retrospectively identified all CTPAs conducted at our institution in 2017 (
n
= 1499). Exams with clinical questions other than PE were excluded from the analysis (
n
= 34). The remaining exams were classified into positive (
n
= 232) and negative (
n
= 1233) for PE based on the final written reports, which defined the reference standard. The fully anonymized 1-mm series in soft tissue reconstruction served as input for the PE detection prototype algorithm that was based on a deep convolutional neural network comprising a Resnet architecture. It was trained and validated on 28,000 CTPAs acquired at other institutions. The result series were reviewed using a web-based feedback platform. Measures of diagnostic performance were calculated on a per patient and a per finding level.
Results
The algorithm correctly identified 215 of 232 exams positive for pulmonary embolism (sensitivity 92.7%; 95% confidence interval CI 88.3–95.5%) and 1178 of 1233 exams negative for pulmonary embolism (specificity 95.5%; 95% CI 94.2–96.6%). On a per finding level, 1174 of 1352 findings marked as embolus by the algorithm were true emboli. Most of the false positive findings were due to contrast agent–related flow artifacts, pulmonary veins, and lymph nodes.
Conclusion
The AI prototype algorithm we tested has a high degree of diagnostic accuracy for the detection of PE on CTPAs. Sensitivity and specificity are balanced, which is a prerequisite for its clinical usefulness.
Key Points
•
An AI-based prototype algorithm showed a high degree of diagnostic accuracy for the detection of pulmonary embolism on CTPAs
.
•
It can therefore help clinicians to automatically prioritize exams with a high suspection of pulmonary embolism and serve as secondary reading tool
.
•
By complementing traditional ways of worklist prioritization in radiology departments, this can speed up the diagnostic and therapeutic workup of patients with pulmonary embolism and help to avoid false negative calls
.
To assess the diagnostic performance of a deep learning-based algorithm for automated detection of acute and chronic rib fractures on whole-body trauma CT.
We retrospectively identified all ...whole-body trauma CT scans referred from the emergency department of our hospital from January to December 2018 (n = 511). Scans were categorized as positive (n = 159) or negative (n = 352) for rib fractures according to the clinically approved written CT reports, which served as the index test. The bone kernel series (1.5-mm slice thickness) served as an input for a detection prototype algorithm trained to detect both acute and chronic rib fractures based on a deep convolutional neural network. It had previously been trained on an independent sample from eight other institutions (n = 11455).
All CTs except one were successfully processed (510/511). The algorithm achieved a sensitivity of 87.4% and specificity of 91.5% on a per-examination level per CT scan: rib fracture(s): yes/no. There were 0.16 false-positives per examination (= 81/510). On a per-finding level, there were 587 true-positive findings (sensitivity: 65.7%) and 307 false-negatives. Furthermore, 97 true rib fractures were detected that were not mentioned in the written CT reports. A major factor associated with correct detection was displacement.
We found good performance of a deep learning-based prototype algorithm detecting rib fractures on trauma CT on a per-examination level at a low rate of false-positives per case. A potential area for clinical application is its use as a screening tool to avoid false-negative radiology reports.
Having gained a tremendous amount of popularity since its introduction in 2006, tract-based spatial statistics (TBSS) can now be considered as the standard approach for voxel-based analysis (VBA) of ...diffusion tensor imaging (DTI) data. Aiming to improve the sensitivity, objectivity, and interpretability of multi-subject DTI studies, TBSS includes a skeletonization step that alleviates residual image misalignment and obviates the need for data smoothing. Although TBSS represents an elegant and user-friendly framework that tackles numerous concerns existing in conventional VBA methods, it has limitations of its own, some of which have already been detailed in recent literature. In this work, we present general methodological considerations on TBSS and report on pitfalls that have not been described previously. In particular, we have identified specific assumptions of TBSS that may not be satisfied under typical conditions. Moreover, we demonstrate that the existence of such violations can severely affect the reliability of TBSS results. With TBSS being used increasingly, it is of paramount importance to acquaint TBSS users with these concerns, such that a well-informed decision can be made as to whether and how to pursue a TBSS analysis. Finally, in addition to raising awareness by providing our new insights, we provide constructive suggestions that could improve the validity and increase the impact of TBSS drastically.
•We investigate tract-based spatial statistics (TBSS) considering potential pitfalls.•TBSS is not tract-specific and we show how this may falsify results.•User defined parameters strongly influence the final TBSS-derived results.•We provide suggestions that improve the validity and increase the impact of TBSS.
Abstract The intravoxel incoherent motion (IVIM) theory provides a framework for the separation of perfusion and diffusion effects in diffusion-weighted imaging (DWI). To measure the three free IVIM ...parameters, DWIs with several diffusion weightings b must be acquired. To date, the used b value distributions are chosen heuristically and vary greatly among researchers. In this work, optimal b value distributions for the three parameter fit are determined using Monte-Carlo simulations for the measurement of a low, medium and high IVIM perfusion regime. The first 16 b values of a b value distribution, which was optimized to be appropriate for all three regimes, are {0, 40, 1000, 240, 10, 750, 90, 390, 170, 10, 620, 210, 100, 0, 530 and 970} in units of seconds per square meter. This distribution performed well for all organs and outperformed a distribution frequently used in the literature. In case of limited acquisition time, the b values should be chosen in the given order, but at least 10 b values should be used for current clinical settings. The overall parameter estimation quality depends strongly and nonlinearly on the signal-to-noise ratio (SNR): it is essential that the SNR is considerably higher than a critical SNR. This critical SNR is about 8 for medium and high IVIM perfusion and 50 for the low IVIM perfusion regime. Initial in vivo IVIM measurements were performed in the abdomen and were in keeping with the numerically simulated results.
Medical imaging quantitative features had once disputable usefulness in clinical studies. Nowadays, advancements in analysis techniques, for instance through machine learning, have enabled ...quantitative features to be progressively useful in diagnosis and research. Tissue characterisation is improved via the "radiomics" features, whose extraction can be automated. Despite the advances, stability of quantitative features remains an important open problem. As features can be highly sensitive to variations of acquisition details, it is not trivial to quantify stability and efficiently select stable features. In this work, we develop and validate a Computed Tomography (CT) simulator environment based on the publicly available ASTRA toolbox ( www.astra-toolbox.com ). We show that the variability, stability and discriminative power of the radiomics features extracted from the virtual phantom images generated by the simulator are similar to those observed in a tandem phantom study. Additionally, we show that the variability is matched between a multi-center phantom study and simulated results. Consequently, we demonstrate that the simulator can be utilised to assess radiomics features' stability and discriminative power.
The use of artificial intelligence (AI) is a powerful tool for image analysis that is increasingly being evaluated by radiology professionals. However, due to the fact that these methods have been ...developed for the analysis of nonmedical image data and data structure in radiology departments is not "AI ready", implementing AI in radiology is not straightforward. The purpose of this review is to guide the reader through the pipeline of an AI project for automated image analysis in radiology and thereby encourage its implementation in radiology departments. At the same time, this review aims to enable readers to critically appraise articles on AI-based software in radiology.
Objectives
To evaluate the performance of a deep convolutional neural network (DCNN) in detecting and classifying distal radius fractures, metal, and cast on radiographs using labels based on ...radiology reports. The secondary aim was to evaluate the effect of the training set size on the algorithm’s performance.
Methods
A total of 15,775 frontal and lateral radiographs, corresponding radiology reports, and a ResNet18 DCNN were used. Fracture detection and classification models were developed per view and merged. Incrementally sized subsets served to evaluate effects of the training set size. Two musculoskeletal radiologists set the standard of reference on radiographs (test set A). A subset (B) was rated by three radiology residents. For a per-study-based comparison with the radiology residents, the results of the best models were merged. Statistics used were ROC and AUC, Youden’s J statistic (J), and Spearman’s correlation coefficient (ρ).
Results
The models’ AUC/J on (A) for metal and cast were 0.99/0.98 and 1.0/1.0. The models’ and residents’ AUC/J on (B) were similar on fracture (0.98/0.91; 0.98/0.92) and multiple fragments (0.85/0.58; 0.91/0.70). Training set size and AUC correlated on metal (ρ = 0.740), cast (ρ = 0.722), fracture (frontal ρ = 0.947, lateral ρ = 0.946), multiple fragments (frontal ρ = 0.856), and fragment displacement (frontal ρ = 0.595).
Conclusions
The models trained on a DCNN with report-based labels to detect distal radius fractures on radiographs are suitable to aid as a secondary reading tool; models for fracture classification are not ready for clinical use. Bigger training sets lead to better models in all categories except joint affection.
Key Points
• Detection of metal and cast on radiographs is excellent using AI and labels extracted from radiology reports.
• Automatic detection of distal radius fractures on radiographs is feasible and the performance approximates radiology residents.
• Automatic classification of the type of distal radius fracture varies in accuracy and is inferior for joint involvement and fragment displacement.