Progression to exudative 'wet' age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence ...(AI) system to predict progression to exAMD in the second eye. By combining models based on three-dimensional (3D) optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% of individual eyes, and false positives in 56% and 17% of individual eyes at the high sensitivity and high specificity points, respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes before conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression.
To apply a deep learning algorithm for automated, objective, and comprehensive quantification of OCT scans to a large real-world dataset of eyes with neovascular age-related macular degeneration ...(AMD) and make the raw segmentation output data openly available for further research.
Retrospective analysis of OCT images from the Moorfields Eye Hospital AMD Database.
A total of 2473 first-treated eyes and 493 second-treated eyes that commenced therapy for neovascular AMD between June 2012 and June 2017.
A deep learning algorithm was used to segment all baseline OCT scans. Volumes were calculated for segmented features such as neurosensory retina (NSR), drusen, intraretinal fluid (IRF), subretinal fluid (SRF), subretinal hyperreflective material (SHRM), retinal pigment epithelium (RPE), hyperreflective foci (HRF), fibrovascular pigment epithelium detachment (fvPED), and serous PED (sPED). Analyses included comparisons between first- and second-treated eyes by visual acuity (VA) and race/ethnicity and correlations between volumes.
Volumes of segmented features (mm3) and central subfield thickness (CST) (μm).
In first-treated eyes, the majority had both IRF and SRF (54.7%). First-treated eyes had greater volumes for all segmented tissues, with the exception of drusen, which was greater in second-treated eyes. In first-treated eyes, older age was associated with lower volumes for RPE, SRF, NSR, and sPED; in second-treated eyes, older age was associated with lower volumes of NSR, RPE, sPED, fvPED, and SRF. Eyes from Black individuals had higher SRF, RPE, and serous PED volumes compared with other ethnic groups. Greater volumes of the majority of features were associated with worse VA.
We report the results of large-scale automated quantification of a novel range of baseline features in neovascular AMD. Major differences between first- and second-treated eyes, with increasing age, and between ethnicities are highlighted. In the coming years, enhanced, automated OCT segmentation may assist personalization of real-world care and the detection of novel structure–function correlations. These data will be made publicly available for replication and future investigation by the AMD research community.
Abstract
Background
Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. ...This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings.
Methods
Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones.
Results
Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference −1.4 ± 4.5 days, 95% CI −1.8, −0.9,
n
= 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00,
n
= 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep.
Conclusions
The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.
Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in ...cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application.
Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI ...technologies and to illustrate its utility via a case study.
Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., “R”) was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.
Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.
Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.
Google LLC.
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this ...paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrapping a skin disease prediction model, where multiple images and a set of optional answers to \(25\) standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training.
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the ...robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generalizable AI that can aid in the diagnosis of skin conditions across a variety of clinical settings. In this retrospective study, we demonstrate that differences in skin condition distribution, rather than in demographics or image capture mode are the main source of errors when an AI algorithm is evaluated on data from a previously unseen source. We demonstrate a series of steps to close this generalization gap, requiring progressively more information about the new source, ranging from the condition distribution to training data enriched for data less frequently seen during training. Our results also suggest comparable performance from end-to-end fine tuning versus fine tuning solely the classification layer on top of a frozen embedding model. Our approach can inform the adaptation of AI algorithms to new settings, based on the information and resources available.
Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but ...shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to estimate gestational age (GA) and fetal malpresentation. We further addressed obstacles that may be encountered in low-resourced settings. Using a simplified sweep protocol with real-time AI feedback on sweep quality, we have demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration. The GA model was non-inferior to standard fetal biometry estimates with as few as two sweeps, and the fetal malpresentation model had high AUC-ROCs across operators and devices. Our AI models have the potential to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.
Post-PASPA, more than twenty states have either implemented or successfully passed legislation allowing their citizens to gamble on sports. These legislative efforts have created an opportunity for ...states to capture some of the over $150 billion that Americans annually wager through illegal means. While Nevada has successfully regulated sports gambling for more than seventy years, the Supreme Court’s Murphy decision has spurred calls for uniform federal legislation. Central to this push for a federal framework are the leaders of the major sports leagues and their claim of concern for contest integrity. Our position is that this argument is tenuous at best. Rather, we believe this to be a convenient avenue for the major sports leagues to become involved in the regulatory process and secure additional revenue via integrity fees for providing data to federally mandated sources. Further, we argue that this framework of requiring states to use certified data sources that originate from the leagues themselves is not only economically damaging for potential sportsbook operators but could actually result in a greater chance of contest corruption rather than increased integrity. This article examines the contrasting views of the major North American professional and amateur sports leagues (NHL, NFL, NBA, MLB, and NCAA) with respect to the new, legalized sports gambling environment. We then look at the argument of maintaining contest integrity which is focal to proposed federal frameworks. Finally, we advocate for a state framework and demonstrate why states are in a better position to regulate sports gambling without federal interference, as state legislation already provides a mechanism by which income from wagers can be redistributed towards improving local infrastructure and funding social welfare programs; under proposed federal legislation, this revenue would divert elsewhere.
Overexpression of HER-2/neu (c-erbB2) is associated with increased risk of recurrent disease in ductal carcinoma in situ (DCIS) and a poorer prognosis in node-positive breast cancer. We therefore ...examined the early immunotherapeutic targeting of HER-2/neu in DCIS. Before surgical resection, HER-2/neu(pos) DCIS patients (n = 13) received 4 weekly vaccinations of dendritic cells pulsed with HER-2/neu HLA class I and II peptides. The vaccine dendritic cells were activated in vitro with IFN-gamma and bacterial lipopolysaccharide to become highly polarized DC1-type dendritic cells that secrete high levels of interleukin-12p70 (IL-12p70). Intranodal delivery of dendritic cells supplied both antigenic stimulation and a synchronized preconditioned burst of IL-12p70 production directly to the anatomic site of T-cell sensitization. Before vaccination, many subjects possessed HER-2/neu-HLA-A2 tetramer-staining CD8(pos) T cells that expressed low levels of CD28 and high levels of the inhibitory B7 ligand CTLA-4, but this ratio inverted after vaccination. The vaccinated subjects also showed high rates of peptide-specific sensitization for both IFN-gamma-secreting CD4(pos) (85%) and CD8(pos) (80%) T cells, with recognition of antigenically relevant breast cancer lines, accumulation of T and B lymphocytes in the breast, and induction of complement-dependent, tumor-lytic antibodies. Seven of 11 evaluable patients also showed markedly decreased HER-2/neu expression in surgical tumor specimens, often with measurable decreases in residual DCIS, suggesting an active process of "immunoediting" for HER-2/neu-expressing tumor cells following vaccination. DC1 vaccination strategies may therefore have potential for both the prevention and the treatment of early breast cancer.