Childhood blindness from retinopathy of prematurity (ROP) is increasing as a result of improvements in neonatal care worldwide. We evaluate the effectiveness of artificial intelligence (AI)-based ...screening in an Indian ROP telemedicine program and whether differences in ROP severity between neonatal care units (NCUs) identified by using AI are related to differences in oxygen-titrating capability.
External validation study of an existing AI-based quantitative severity scale for ROP on a data set of images from the Retinopathy of Prematurity Eradication Save Our Sight ROP telemedicine program in India. All images were assigned an ROP severity score (1-9) by using the Imaging and Informatics in Retinopathy of Prematurity Deep Learning system. We calculated the area under the receiver operating characteristic curve and sensitivity and specificity for treatment-requiring retinopathy of prematurity. Using multivariable linear regression, we evaluated the mean and median ROP severity in each NCU as a function of mean birth weight, gestational age, and the presence of oxygen blenders and pulse oxygenation monitors.
The area under the receiver operating characteristic curve for detection of treatment-requiring retinopathy of prematurity was 0.98, with 100% sensitivity and 78% specificity. We found higher median (interquartile range) ROP severity in NCUs without oxygen blenders and pulse oxygenation monitors, most apparent in bigger infants (>1500 g and 31 weeks' gestation: 2.7 2.5-3.0 vs 3.1 2.4-3.8;
= .007, with adjustment for birth weight and gestational age).
Integration of AI into ROP screening programs may lead to improved access to care for secondary prevention of ROP and may facilitate assessment of disease epidemiology and NCU resources.
Using medical images to evaluate disease severity and change over time is a routine and important task in clinical decision making. Grading systems are often used, but are unreliable as domain ...experts disagree on disease severity category thresholds. These discrete categories also do not reflect the underlying continuous spectrum of disease severity. To address these issues, we developed a convolutional Siamese neural network approach to evaluate disease severity at single time points and change between longitudinal patient visits on a continuous spectrum. We demonstrate this in two medical imaging domains: retinopathy of prematurity (ROP) in retinal photographs and osteoarthritis in knee radiographs. Our patient cohorts consist of 4861 images from 870 patients in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) cohort study and 10,012 images from 3021 patients in the Multicenter Osteoarthritis Study (MOST), both of which feature longitudinal imaging data. Multiple expert clinician raters ranked 100 retinal images and 100 knee radiographs from excluded test sets for severity of ROP and osteoarthritis, respectively. The Siamese neural network output for each image in comparison to a pool of normal reference images correlates with disease severity rank (
= 0.87 for ROP and
= 0.89 for osteoarthritis), both within and between the clinical grading categories. Thus, this output can represent the continuous spectrum of disease severity at any single time point. The difference in these outputs can be used to show change over time. Alternatively, paired images from the same patient at two time points can be directly compared using the Siamese neural network, resulting in an additional continuous measure of change between images. Importantly, our approach does not require manual localization of the pathology of interest and requires only a binary label for training (same versus different). The location of disease and site of change detected by the algorithm can be visualized using an occlusion sensitivity map-based approach. For a longitudinal binary change detection task, our Siamese neural networks achieve test set receiving operator characteristic area under the curves (AUCs) of up to 0.90 in evaluating ROP or knee osteoarthritis change, depending on the change detection strategy. The overall performance on this binary task is similar compared to a conventional convolutional deep-neural network trained for multi-class classification. Our results demonstrate that convolutional Siamese neural networks can be a powerful tool for evaluating the continuous spectrum of disease severity and change in medical imaging.
To develop an automated measure of COVID-19 pulmonary disease severity on chest radiographs (CXRs), for longitudinal disease tracking and outcome prediction.
A convolutional Siamese neural ...network-based algorithm was trained to output a measure of pulmonary disease severity on CXRs (pulmonary x-ray severity (PXS) score), using weakly-supervised pretraining on ∼160,000 anterior-posterior images from CheXpert and transfer learning on 314 frontal CXRs from COVID-19 patients. The algorithm was evaluated on internal and external test sets from different hospitals (154 and 113 CXRs respectively). PXS scores were correlated with radiographic severity scores independently assigned by two thoracic radiologists and one in-training radiologist (Pearson r). For 92 internal test set patients with follow-up CXRs, PXS score change was compared to radiologist assessments of change (Spearman ρ). The association between PXS score and subsequent intubation or death was assessed. Bootstrap 95% confidence intervals (CI) were calculated.
PXS scores correlated with radiographic pulmonary disease severity scores assigned to CXRs in the internal and external test sets (r=0.86 (95%CI 0.80-0.90) and r=0.86 (95%CI 0.79-0.90) respectively). The direction of change in PXS score in follow-up CXRs agreed with radiologist assessment (ρ=0.74 (95%CI 0.63-0.81)). In patients not intubated on the admission CXR, the PXS score predicted subsequent intubation or death within three days of hospital admission (area under the receiver operating characteristic curve=0.80 (95%CI 0.75-0.85)).
A Siamese neural network-based severity score automatically measures radiographic COVID-19 pulmonary disease severity, which can be used to track disease change and predict subsequent intubation or death.
Retinopathy of prematurity (ROP) is one of the leading causes of blindness in children. Although the role of oxygen in the pathophysiology of ROP is well established, a precise understanding of the ...dynamic relationship between oxygen exposure ROP incidence and severity is lacking. The purpose of this study was to evaluate the correlation between time-dependent oxygen variables and the onset of ROP.
Retrospective cohort study.
Two hundred thirty infants who were born at a single academic center and met the inclusion criteria were included. Infants are mainly born between January 2011 and October 2022.
Patient data were extracted from electronic health records (EHRs), with sufficient time-dependent oxygen data. Clinical outcomes for ROP were recorded as none/mild or moderate/severe (defined as type II or worse). Mixed-effects linear models were used to compare the 2 groups in terms of dynamic oxygen variables, such as daily average and the coefficient of variation (COV) fraction of inspired oxygen (FiO2). Support vector machine (SVM) and long-short-term memory (LSTM)-based multimodal models were trained with fivefold cross-validation to predict which infants would develop moderate/severe ROP. Gestational age (GA), birth weight, and time-dependent oxygen variables were used to develop predictive models.
Model cross-validation performance was evaluated by computing the mean area under the receiver operating characteristic (AUROC) curve, precision, recall, and F1 score.
We found that both daily average and COV of FiO2 were associated with more severe ROP (adjusted P < 0.001). With fivefold cross-validation, the multimodal LSTM models had higher performance than the best static models (SVM using GA and 3 average FiO2 features) and SVM models trained on GA alone (mean AUROC = 0.89 ± 0.04 vs. 0.86 ± 0.05 vs. 0.83 ± 0.04).
The development of severe ROP might not only be influenced by oxygen exposure but also by its fluctuation, which provides direction for future study of pathophysiological factors associated with severe ROP development. Additionally, we demonstrated that multimodal neural networks can be a method to extract useful information from time-series data, which may be a valuable methodology for the investigation of other diseases using EHR data.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
To evaluate the performance of a deep learning (DL) algorithm for retinopathy of prematurity (ROP) screening in Nepal and Mongolia.
Retrospective analysis of prospectively collected clinical data.
...Clinical information and fundus images were obtained from infants in 2 ROP screening programs in Nepal and Mongolia.
Fundus images were obtained using the Forus 3nethra neo (Forus Health) in Nepal and the RetCam Portable (Natus Medical, Inc.) in Mongolia. The overall severity of ROP was determined from the medical record using the International Classification of ROP (ICROP). The presence of plus disease was determined independently in each image using a reference standard diagnosis. The Imaging and Informatics for ROP (i-ROP) DL algorithm was trained on images from the RetCam to classify plus disease and to assign a vascular severity score (VSS) from 1 through 9.
Area under the receiver operating characteristic curve and area under the precision-recall curve for the presence of plus disease or type 1 ROP and association between VSS and ICROP disease category.
The prevalence of type 1 ROP was found to be higher in Mongolia (14.0%) than in Nepal (2.2%; P < 0.001) in these data sets. In Mongolia (RetCam images), the area under the receiver operating characteristic curve for examination-level plus disease detection was 0.968, and the area under the precision-recall curve was 0.823. In Nepal (Forus images), these values were 0.999 and 0.993, respectively. The ROP VSS was associated with ICROP classification in both datasets (P < 0.001). At the population level, the median VSS was found to be higher in Mongolia (2.7; interquartile range IQR, 1.3–5.4) as compared with Nepal (1.9; IQR, 1.2–3.4; P < 0.001).
These data provide preliminary evidence of the effectiveness of the i-ROP DL algorithm for ROP screening in neonatal populations in Nepal and Mongolia using multiple camera systems and are useful for consideration in future clinical implementation of artificial intelligence–based ROP screening in low- and middle-income countries.
Categorizing highly complex aerial scenes is quite strenuous due to the presence of detailed information with a large number of distinctive objects. Recognition happens by first deriving a joint ...relationship within all these distinguishing objects, distilling finally to some meaningful knowledge that is subsequently employed to label the scene. However, something intriguing is whether all this captured information is actually relevant to classify such a complex scene? What if some objects just create uncertainty with respect to the target label, thereby causing ambiguity in the decision-making? In this letter, we investigate these questions and analyze as to which regions in an aerial scene are the most relevant and are inhibiting in determining the image label accurately. However, for such aerial scene classification (ASC) task, employing supervised knowledge of experts to annotate these discriminative regions is quite costly and laborious, especially when the data set is huge. To this end, we propose a deep weakly supervised learning (DWSL) technique. Our classification-trained convolutional neural network learns to identify discriminative region localizations in an aerial scene solely by utilizing image labels. Using the DWSL model, we significantly improve the recognition accuracies of highly complex scenes, thus validating that extra information causes uncertainty in decision-making. Moreover, our DWSL methodology can also be leveraged as a novel tool for concrete visualization of the most informative regions relevant to accurately classify an aerial scene. Finally, our proposed framework yields a state-of-the-art performance on the existing ASC data sets.
Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy ...concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets.
Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data.
Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants.
Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets.
Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong’s test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar’s chi-square test and Cohen’s κ value.
The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P= 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations.
Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.
A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) ...output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Mapping Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method.
Retinopathy of prematurity (ROP) is a leading cause of childhood blindness related to oxygen exposure in premature infants. Since oxygen monitoring protocols have reduced the incidence of ...treatment-requiring ROP (TR-ROP), it remains unclear whether oxygen exposure remains a relevant risk factor for incident TR-ROP and aggressive ROP (A-ROP), a severe, rapidly progressing form of ROP. The purpose of this proof-of-concept study was to use electronic health record (EHR) data to evaluate early oxygen exposure as a predictive variable for developing TR-ROP and A-ROP.
Retrospective cohort study.
Two hundred forty-four infants screened for ROP at a single academic center.
For each infant, oxygen saturations and fraction of inspired oxygen (FiO2) were extracted manually from the EHR until 31 weeks postmenstrual age (PMA). Cumulative minimum, maximum, and mean oxygen saturation and FiO2 were calculated on a weekly basis. Random forest models were trained with 5-fold cross-validation using gestational age (GA) and cumulative minimum FiO2 at 30 weeks PMA to identify infants who developed TR-ROP. Secondary receiver operating characteristic (ROC) curve analysis of infants with or without A-ROP was performed without cross-validation because of small numbers.
For each model, cross-validation performance for incident TR-ROP was assessed using area under the ROC curve (AUC) and area under the precision-recall curve (AUPRC) scores. For A-ROP, we calculated AUC and evaluated sensitivity and specificity at a high-sensitivity operating point.
Of the 244 infants included, 33 developed TR-ROP, of which 5 developed A-ROP. For incident TR-ROP, random forest models trained on GA plus cumulative minimum FiO2 (AUC = 0.93 ± 0.06; AUPRC = 0.76 ± 0.08) were not significantly better than models trained on GA alone (AUC = 0.92 ± 0.06 P = 0.59; AUPRC = 0.74 ± 0.12 P = 0.32). Models using oxygen alone showed an AUC of 0.80 ± 0.09. ROC analysis for A-ROP found an AUC of 0.92 (95% confidence interval, 0.87–0.96).
Oxygen exposure can be extracted from the EHR and quantified as a risk factor for incident TR-ROP and A-ROP. Extracting quantifiable clinical features from the EHR may be useful for building risk models for multiple diseases and evaluating the complex relationships among oxygen exposure, ROP, and other sequelae of prematurity.
Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually ...heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base convolutional neural networks (CNNs) and generalized to various types of medical imaging tasks. The code is publicly available at https://github.com/zm17943/SplitAVG .