Federated learning (FL) is a method used for training artificial intelligence models with data from multiple sources while maintaining data anonymity, thus removing many barriers to data sharing. ...Here we used data from 20 institutes across the globe to train a FL model, called EXAM (electronic medical record (EMR) chest X-ray AI model), that predicts the future oxygen requirements of symptomatic patients with COVID-19 using inputs of vital signs, laboratory data and chest X-rays. EXAM achieved an average area under the curve (AUC) >0.92 for predicting outcomes at 24 and 72 h from the time of initial presentation to the emergency room, and it provided 16% improvement in average AUC measured across all participating sites and an average increase in generalizability of 38% when compared with models trained at a single site using that site's data. For prediction of mechanical ventilation treatment or death at 24 h at the largest independent test site, EXAM achieved a sensitivity of 0.950 and specificity of 0.882. In this study, FL facilitated rapid data science collaboration without data exchange and generated a model that generalized across heterogeneous, unharmonized datasets for prediction of clinical outcomes in patients with COVID-19, setting the stage for the broader use of FL in healthcare.
•Development of a content-based image retrieval system for chest radiograph based on a novel deep metric learning algorithm.•Validation on an international multi-site COVID-19 dataset with superior ...performance in image retrieval, diagnosis and prognosis tasks.•Good transferability and generalizability on other clinical decision support tasks in the treatment and management of COVID-19 patient.•Being implemented in clinical workflow at Partners Healthcare due to high clinical impact for COVID-19 diagnosis, prognosis and patient management.
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States. Considering the mixed and unspecific signals in CXR, an image retrieval model of CXR that provides both similar images and associated clinical information can be more clinically meaningful than a direct image diagnostic model. In this work we develop a novel CXR image retrieval model based on deep metric learning. Unlike traditional diagnostic models which aim at learning the direct mapping from images to labels, the proposed model aims at learning the optimized embedding space of images, where images with the same labels and similar contents are pulled together. The proposed model utilizes multi-similarity loss with hard-mining sampling strategy and attention mechanism to learn the optimized embedding space, and provides similar images, the visualizations of disease-related attention maps and useful clinical information to assist clinical decisions. The model is trained and validated on an international multi-site COVID-19 dataset collected from 3 different sources. Experimental results of COVID-19 image retrieval and diagnosis tasks show that the proposed model can serve as a robust solution for CXR analysis and patient management for COVID-19. The model is also tested on its transferability on a different clinical decision support task for COVID-19, where the pre-trained model is applied to extract image features from a new dataset without any further training. The extracted features are then combined with COVID-19 patient's vitals, lab tests and medical histories to predict the possibility of airway intubation in 72 hours, which is strongly associated with patient prognosis, and is crucial for patient care and hospital resource planning. These results demonstrate our deep metric learning based image retrieval model is highly efficient in the CXR retrieval, diagnosis and prognosis, and thus has great clinical value for the treatment and management of COVID-19 patients.
Display omitted
To compare the performance of artificial intelligence (AI) and Radiographic Assessment of Lung Edema (RALE) scores from frontal chest radiographs (CXRs) for predicting patient outcomes and the need ...for mechanical ventilation in COVID-19 pneumonia. Our IRB-approved study included 1367 serial CXRs from 405 adult patients (mean age 65 ± 16 years) from two sites in the US (Site A) and South Korea (Site B). We recorded information pertaining to patient demographics (age, gender), smoking history, comorbid conditions (such as cancer, cardiovascular and other diseases), vital signs (temperature, oxygen saturation), and available laboratory data (such as WBC count and CRP). Two thoracic radiologists performed the qualitative assessment of all CXRs based on the RALE score for assessing the severity of lung involvement. All CXRs were processed with a commercial AI algorithm to obtain the percentage of the lung affected with findings related to COVID-19 (AI score). Independent t- and chi-square tests were used in addition to multiple logistic regression with Area Under the Curve (AUC) as output for predicting disease outcome and the need for mechanical ventilation. The RALE and AI scores had a strong positive correlation in CXRs from each site (r
= 0.79-0.86; p < 0.0001). Patients who died or received mechanical ventilation had significantly higher RALE and AI scores than those with recovery or without the need for mechanical ventilation (p < 0.001). Patients with a more substantial difference in baseline and maximum RALE scores and AI scores had a higher prevalence of death and mechanical ventilation (p < 0.001). The addition of patients' age, gender, WBC count, and peripheral oxygen saturation increased the outcome prediction from 0.87 to 0.94 (95% CI 0.90-0.97) for RALE scores and from 0.82 to 0.91 (95% CI 0.87-0.95) for the AI scores. AI algorithm is as robust a predictor of adverse patient outcome (death or need for mechanical ventilation) as subjective RALE scores in patients with COVID-19 pneumonia.
Cross-institution collaborations are constrained by data-sharing challenges. These challenges hamper innovation, particularly in artificial intelligence, where models require diverse data to ensure ...strong performance. Federated learning (FL) solves data-sharing challenges. In typical collaborations, data is sent to a central repository where models are trained. With FL, models are sent to participating sites, trained locally, and model weights aggregated to create a master model with improved performance. At the 2021 Radiology Society of North America’s (RSNA) conference, a panel was conducted titled “Accelerating AI: How Federated Learning Can Protect Privacy, Facilitate Collaboration and Improve Outcomes.” Two groups shared insights: researchers from the EXAM study (EMC CXR AI Model) and members of the National Cancer Institute’s Early Detection Research Network’s (EDRN) pancreatic cancer working group. EXAM brought together 20 institutions to create a model to predict oxygen requirements of patients seen in the emergency department with COVID-19 symptoms. The EDRN collaboration is focused on improving outcomes for pancreatic cancer patients through earlier detection. This paper describes major insights from the panel, including direct quotes. The panelists described the impetus for FL, the long-term potential vision of FL, challenges faced in FL, and the immediate path forward for FL.
Early and accurate diagnosis of Coronavirus disease (COVID-19) is essential for patient isolation and contact tracing so that the spread of infection can be limited. Computed tomography (CT) can ...provide important information in COVID-19, especially for patients with moderate to severe disease as well as those with worsening cardiopulmonary status. As an automatic tool, deep learning methods can be utilized to perform semantic segmentation of affected lung regions, which is important to establish disease severity and prognosis prediction. Both the extent and type of pulmonary opacities help assess disease severity. However, manually pixel-level multi-class labelling is time-consuming, subjective, and non-quantitative. In this article, we proposed a hybrid weak label-based deep learning method that utilize both the manually annotated pulmonary opacities from COVID-19 pneumonia and the patient-level disease-type information available from the clinical report. A UNet was firstly trained with semantic labels to segment the total infected region. It was used to initialize another UNet, which was trained to segment the consolidations with patient-level information using the Expectation-Maximization (EM) algorithm. To demonstrate the performance of the proposed method, multi-institutional CT datasets from Iran, Italy, South Korea, and the United States were utilized. Results show that our proposed method can predict the infected regions as well as the consolidation regions with good correlation to human annotation.
Interest in artificial intelligence (AI) has grown exponentially in recent years, attracting sensational headlines and speculation. While there is considerable potential for AI to augment clinical ...practice, there remain numerous practical implications that must be considered when exploring AI solutions. These range from ethical concerns about algorithmic bias to legislative concerns in an uncertain regulatory environment. In the absence of established protocols and examples of best practice, there is a growing need for clear guidance both for innovators and early adopters. Broadly, there are three stages to the innovation process: invention, development and implementation. In this paper, we present key considerations for innovators at each stage and offer suggestions along the AI development pipeline, from bench to bedside.
Interest in artificial intelligence (AI) has grown exponentially in recent years, attracting sensational headlines and speculation. While there is considerable potential for AI to augment clinical ...practice, there remain numerous practical implications that must be considered when exploring AI solutions. These range from ethical concerns about algorithmic bias to legislative concerns in an uncertain regulatory environment. In the absence of established protocols and examples of best practice, there is a growing need for clear guidance both for innovators and early adopters. Broadly, there are three stages to the innovation process: invention, development and implementation. In this paper, we present key considerations for innovators at each stage and offer suggestions along the AI development pipeline, from bench to bedside.
•Deep learning method can robustly segment lung infection regions from CT images of COVID-19 patients. The correlation coefficient of the network prediction and manual segmentation was high to very ...high.•Combining CT-derived biomarkers with electronic health records can achieve the best prognosis prediction with AUC’s ranging between 85–93.•Prognosis results indicated that age, Oxygen saturation, CT-derived biomarkers, platelet count, and white blood cell count were the most important prognostic predictors of COVID-19.
As of August 30th, there were in total 25.1 million confirmed cases and 845 thousand deaths caused by coronavirus disease of 2019 (COVID-19) worldwide. With overwhelming demands on medical resources, patient stratification based on their risks is essential. In this multi-center study, we built prognosis models to predict severity outcomes, combining patients’ electronic health records (EHR), which included vital signs and laboratory data, with deep learning- and CT-based severity prediction.
We first developed a CT segmentation network using datasets from multiple institutions worldwide. Two biomarkers were extracted from the CT images: total opacity ratio (TOR) and consolidation ratio (CR). After obtaining TOR and CR, further prognosis analysis was conducted on datasets from INSTITUTE-1, INSTITUTE-2 and INSTITUTE-3. For each data cohort, generalized linear model (GLM) was applied for prognosis prediction.
For the deep learning model, the correlation coefficient of the network prediction and manual segmentation was 0.755, 0.919, and 0.824 for the three cohorts, respectively. The AUC (95 % CI) of the final prognosis models was 0.85(0.77,0.92), 0.93(0.87,0.98), and 0.86(0.75,0.94) for INSTITUTE-1, INSTITUTE-2 and INSTITUTE-3 cohorts, respectively. Either TOR or CR exist in all three final prognosis models. Age, white blood cell (WBC), and platelet (PLT) were chosen predictors in two cohorts. Oxygen saturation (SpO2) was a chosen predictor in one cohort.
The developed deep learning method can segment lung infection regions. Prognosis results indicated that age, SpO2, CT biomarkers, PLT, and WBC were the most important prognostic predictors of COVID-19 in our prognosis model.
To tune and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations. A published ...convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from 4 test sets, including 3 from the United States (patients hospitalized at an academic medical center (N = 154), patients hospitalized at a community hospital (N = 113), and outpatients (N = 108)) and 1 from Brazil (patients at an academic medical center emergency department (N = 303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson R). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results. Tuning the deep learning model with outpatient data showed high model performance in 2 United States hospitalized patient datasets (R = 0.88 and R = 0.90, compared to baseline R = 0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets (R = 0.86 and R = 0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets. A deep learning model that extracts a COVID-19 severity score on CXRs showed generalizable performance across multiple populations from 2 continents, including outpatients and hospitalized patients.
Artificial intelligence models trained in one site may not have acceptable performance when used in another site. Transfer learning (TL) can be used to adapt the original model to a new institution, ...making it more robust and generalizable.
Performance of a 4D cardiac computed tomography angiography (CCTA) segmentation model trained at Site 1 was assessed at Site 2, before and after TL. Two separate image-annotated 4D CCTA datasets were collected at each site. Segmentation output from the model was used to measure left ventricular (LV) ejection fraction (EF), LV end-diastolic volume (EDV), and LV mass and compared with the ground-truth (measurements derived from the segmentation performed by trained radiologists). Wilcoxon signed-rank test (with 95% CI) was used to compare the absolute errors between predicted and ground-truth values obtained at Site 2 before and after TL.
Test set at Site 2 included 45 patients (27 women, mean age 47.9 ± 10.8 years). There was a significant difference in absolute errors of LVEF (mean ± std 10.0 ± 6.0% vs 3.7 ± 2.5%, p < 0.05), LVEDV (mean ± std 8.4 ± 6.7 mL vs 5.9 ± 5.9 mL, p < 0.05) and LV mass (mean ± std 12.0 ± 11.6g vs 7.7 ± 9.9g, p < 0.05) when comparing model performance before and after TL at Site 2.
The TL process significantly improved the prediction of LV volumetric parameters obtained at Site 2 by adapting the model to another source. A small number of annotated cases can be used to significantly improve a deep learning model developed elsewhere, increasing model generalizability and encouraging institutions to engage in artificial intelligence initiatives.
•AI models trained in one site may not have acceptable performance in another site.•Transfer learning can be used to adapt the model to local cases in other sites.•Model was applied to computed tomography angiography scans in two sites.•Segmentation output was used to calculate left ventricular measurements.•Deep learning left ventricular segmentation model improved after transfer learning.