Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming ...task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.
We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval CI 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.
In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and ...variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament ACL tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation.
Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 41.5% female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts.
Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.
The development of deep learning algorithms for complex tasks in digital medicine has relied on the availability of large labeled training datasets, usually containing hundreds of thousands of ...examples. The purpose of this study was to develop a 3D deep learning model, AppendiXNet, to detect appendicitis, one of the most common life-threatening abdominal emergencies, using a small training dataset of less than 500 training CT exams. We explored whether pretraining the model on a large collection of natural videos would improve the performance of the model over training the model from scratch. AppendiXNet was pretrained on a large collection of YouTube videos called Kinetics, consisting of approximately 500,000 video clips and annotated for one of 600 human action classes, and then fine-tuned on a small dataset of 438 CT scans annotated for appendicitis. We found that pretraining the 3D model on natural videos significantly improved the performance of the model from an AUC of 0.724 (95% CI 0.625, 0.823) to 0.810 (95% CI 0.725, 0.895). The application of deep learning to detect abnormalities on CT examinations using video pretraining could generalize effectively to other challenging cross-sectional medical imaging tasks when training data is limited.
Computational decision support systems could provide clinical value in whole-body FDG-PET/CT workflows. However, limited availability of labeled data combined with the large size of PET/CT imaging ...exams make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we describe a weak supervision framework that extracts imperfect, yet highly granular, regional abnormality labels from free-text radiology reports. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each imaging exam. Using these generated labels, we then train an attention-based, multi-task CNN architecture to detect and estimate the location of abnormalities in whole-body scans. We demonstrate empirically that our multi-task representation is critical for strong performance on rare abnormalities with limited training data. The representation also contributes to more accurate mortality prediction from imaging data, suggesting the potential utility of our framework beyond abnormality detection and location estimation.
Radiation necrosis in the brain commonly occurs in three distinct clinical scenarios, namely, radiation therapy for head and neck malignancy or intracranial extraaxial tumor, stereotactic radiation ...therapy (including radiosurgery) for brain metastasis, and radiation therapy for primary brain tumors. Knowledge of the radiation treatment plan, amount of brain tissue included in the radiation port, type of radiation, location of the primary malignancy, and amount of time elapsed since radiation therapy is extremely important in determining whether the imaging abnormality represents radiation necrosis or recurrent tumor. Conventional magnetic resonance (MR) imaging findings of these two entities overlap considerably, and even at histopathologic analysis, tumor mixed with radiation necrosis is a common finding. Advanced imaging modalities such as diffusion tensor imaging and perfusion MR imaging (with calculation of certain specific parameters such as apparent diffusion coefficient ratios, relative peak height, and percentage of signal recovery), MR spectroscopy, and positron emission tomography can be useful in differentiating between recurrent tumor and radiation necrosis. In everyday practice, the visual assessment of diffusion-weighted and perfusion images may also be helpful by favoring one diagnosis over the other, with restricted diffusion and an elevated relative cerebral blood volume being seen much more frequently in recurrent tumor than in radiation necrosis.
Coronary artery calcium (CAC) can be identified on nongated chest computed tomography (CT) scans, but this finding is not consistently incorporated into care. A deep learning algorithm enables ...opportunistic CAC screening of nongated chest CT scans. Our objective was to evaluate the effect of notifying clinicians and patients of incidental CAC on statin initiation.
NOTIFY-1 (Incidental Coronary Calcification Quality Improvement Project) was a randomized quality improvement project in the Stanford Health Care System. Patients without known atherosclerotic cardiovascular disease or a previous statin prescription were screened for CAC on a previous nongated chest CT scan from 2014 to 2019 using a validated deep learning algorithm with radiologist confirmation. Patients with incidental CAC were randomly assigned to notification of the primary care clinician and patient versus usual care. Notification included a patient-specific image of CAC and guideline recommendations regarding statin use. The primary outcome was statin prescription within 6 months.
Among 2113 patients who met initial clinical inclusion criteria, CAC was identified by the algorithm in 424 patients. After chart review and additional exclusions were made, a radiologist confirmed CAC among 173 of 194 patients (89.2%) who were randomly assigned to notification or usual care. At 6 months, the statin prescription rate was 51.2% (44/86) in the notification arm versus 6.9% (6/87) with usual care (
<0.001). There was also more coronary artery disease testing in the notification arm (15.1% 13/86 versus 2.3% 2/87;
=0.008).
Opportunistic CAC screening of previous nongated chest CT scans followed by clinician and patient notification led to a significant increase in statin prescriptions. Further research is needed to determine whether this approach can reduce atherosclerotic cardiovascular disease events.
URL: https://www.
gov; Unique identifier: NCT04789278.
Transjugular intrahepatic portosystemic shunt (TIPS) may be placed to treat complications of portal hypertension by creating a conduit between the hepatic and portal vein. The diagnosis of ...hepatocellular carcinoma (HCC) is typically made by multiphasic imaging studies demonstrating arterial enhancement with washout on arterial, portal venous, and delayed phase imaging. The aim of our study was to determine how the presence of TIPS would affect the imaging diagnosis of HCC.
This was a single-center electronic database review of all patients who underwent multiphasic imaging with MRI or CT scan for HCC screening between January 2000 and July 2017 and who were subsequently diagnosed with HCC. Data collected included patient demographics, liver disease characteristics including CPT score, MELD-Na, AFP, type of imaging, tumor stage, and lab values at the time of HCC diagnosis. The diagnosis of HCC was made using LI-RADS criteria on contrast-enhanced CT or MR imaging and confirmed by chart abstraction as documented by the treating clinician. Demographic and imaging characteristics for HCC patients with and without TIPS were compared.
A total of 279 patients met eligibility criteria for the study, 37 (13.2%) of whom had TIPS placed prior to diagnosis of HCC. There was no significant difference in demographics or liver disease characteristics between patients with and without TIPS. Compared to cirrhotic patients with no TIPS prior to HCC diagnosis, patients with TIPS had significantly more scans with a longer duration of surveillance until HCC diagnosis. However, LI-RADS criteria and stage of HCC at diagnosis were not significantly different between both groups. There were no differences in outcomes including liver transplant and survival.
The presence of TIPS does not lead to a delayed diagnosis of HCC. It is associated, however, with greater duration of time from first scan to diagnosis of HCC.
Pulmonary embolism (PE) is a life-threatening clinical problem and computed tomography pulmonary angiography (CTPA) is the gold standard for diagnosis. Prompt diagnosis and immediate treatment are ...critical to avoid high morbidity and mortality rates, yet PE remains among the diagnoses most frequently missed or delayed. In this study, we developed a deep learning model-PENet, to automatically detect PE on volumetric CTPA scans as an end-to-end solution for this purpose. The PENet is a 77-layer 3D convolutional neural network (CNN) pretrained on the Kinetics-600 dataset and fine-tuned on a retrospective CTPA dataset collected from a single academic institution. The PENet model performance was evaluated in detecting PE on data from two different institutions: one as a hold-out dataset from the same institution as the training data and a second collected from an external institution to evaluate model generalizability to an unrelated population dataset. PENet achieved an AUROC of 0.84 0.82-0.87 on detecting PE on the hold out internal test set and 0.85 0.81-0.88 on external dataset. PENet also outperformed current state-of-the-art 3D CNN models. The results represent successful application of an end-to-end 3D CNN model for the complex task of PE diagnosis without requiring computationally intensive and time consuming preprocessing and demonstrates sustained performance on data from an external institution. Our model could be applied as a triage tool to automatically identify clinically important PEs allowing for prioritization for diagnostic radiology interpretation and improved care pathways via more efficient diagnosis.