Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases ...using medical imaging.
In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of deep learning models and health-care professionals based on medical imaging, for any disease, were included. We excluded studies that used medical waveform data graphics material or investigated the accuracy of image segmentation rather than disease classification. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample external validation were included in a meta-analysis, using a unified hierarchical model. This study is registered with PROSPERO, CRD42018091176.
Our search identified 31 587 studies, of which 82 (describing 147 patient cohorts) were included. 69 studies provided enough data to construct contingency tables, enabling calculation of test accuracy, with sensitivity ranging from 9·7% to 100·0% (mean 79·1%, SD 0·2) and specificity ranging from 38·9% to 100·0% (mean 88·3%, SD 0·1). An out-of-sample external validation was done in 25 studies, of which 14 made the comparison between deep learning models and health-care professionals in the same sample. Comparison of the performance between health-care professionals in these 14 studies, when restricting the analysis to the contingency table for each study reporting the highest accuracy, found a pooled sensitivity of 87·0% (95% CI 83·0-90·2) for deep learning models and 86·4% (79·9-91·0) for health-care professionals, and a pooled specificity of 92·5% (95% CI 85·1-96·4) for deep learning models and 90·5% (80·6-95·7) for health-care professionals.
Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals. However, a major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample. Additionally, poor reporting is prevalent in deep learning studies, which limits reliable interpretation of the reported diagnostic accuracy. New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology.
None.
Objective
To estimate 22‐year trends in the prevalence and incidence of scleritis, and the associations of scleritis with infectious and immune‐mediated inflammatory diseases (I‐IMIDs) in the UK.
...Methods
The retrospective cross‐sectional and population cohort study (1997–2018) included 10,939,823 patients (2,946 incident scleritis cases) in The Health Improvement Network, a nationally representative primary care records database. The case–control and matched cohort study (1995–2019) included 3,005 incident scleritis cases and 12,020 control patients matched by age, sex, region, and Townsend deprivation index. Data were analyzed using multivariable Poisson regression, multivariable logistic regression, and Cox proportional hazards multivariable models adjusted for age, sex, Townsend deprivation index, race/ethnicity, smoking status, nation within the UK, and body mass index. Incidence rate ratios (IRRs) and 95% confidence intervals (95% CIs) were calculated.
Results
Scleritis incidence rates per 100,000 person‐years declined from 4.23 (95% CI 2.16–6.31) to 2.79 (95% CI 2.19–3.39) between 1997 and 2018. The prevalence of scleritis per 100,000 person‐years was 93.62 (95% CI 90.17–97.07) in 2018 (61,650 UK patients). Among 2,946 patients with incident scleritis, 1,831 (62.2%) were female, the mean ± SD age was 44.9 ± 17.6 years (range 1–93), and 1,257 (88.8%) were White. Higher risk of incident scleritis was associated with female sex (adjusted IRR 1.53 95% CI 1.43–1.66, P < 0.001), Black race/ethnicity (adjusted IRR 1.52 95% CI 1.14–2.01, P = 0.004 compared to White race/ethnicity), or South Asian race/ethnicity (adjusted IRR 1.50 95% CI 1.19–1.90, P < 0.001 compared to White race/ethnicity), and older age (peak adjusted IRR 4.95 95% CI 3.99–6.14, P < 0.001 for patients ages 51–60 years versus those ages ≤10 years). Compared to controls, scleritis patients had a 2‐fold increased risk of a prior I‐IMID diagnosis (17 I‐IMIDs, P < 0.001) and significantly increased risk of subsequent diagnosis (13 I‐IMIDs). The I‐IMIDs most strongly associated with scleritis included granulomatosis with polyangiitis, Behçet’s disease, and Sjögren’s syndrome.
Conclusion
From 1997 through 2018, the UK incidence of scleritis declined from 4.23 to 2.79/100,000 person‐years. Incident scleritis was associated with 19 I‐IMIDs, providing data for rational investigation and cross‐specialty engagement.
The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelines for reporting randomised trials. Its widespread use has been instrumental in ensuring transparency ...when evaluating new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes.The CONSORT-AI extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI. Both guidelines were developed through a staged consensus process, involving a literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed on in a two-day consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants).The CONSORT-AI extension includes 14 new items, which were considered sufficiently important for AI interventions, that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and providing analysis of error cases.CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer-reviewers, as well as the general readership, to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.
Deep learning may transform health care, but model development has largely been dependent on availability of advanced technical expertise. Herein we present the development of a deep learning model ...by clinicians without coding, which predicts reported sex from retinal fundus photographs. A model was trained on 84,743 retinal fundus photos from the UK Biobank dataset. External validation was performed on 252 fundus photos from a tertiary ophthalmic referral center. For internal validation, the area under the receiver operating characteristic curve (AUROC) of the code free deep learning (CFDL) model was 0.93. Sensitivity, specificity, positive predictive value (PPV) and accuracy (ACC) were 88.8%, 83.6%, 87.3% and 86.5%, and for external validation were 83.9%, 72.2%, 78.2% and 78.6% respectively. Clinicians are currently unaware of distinct retinal feature variations between males and females, highlighting the importance of model explainability for this task. The model performed significantly worse when foveal pathology was present in the external validation dataset, ACC: 69.4%, compared to 85.4% in healthy eyes, suggesting the fovea is a salient region for model performance OR (95% CI): 0.36 (0.19, 0.70) p = 0.0022. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers.
The SPIRIT 2013 (The Standard Protocol Items: Recommendations for Interventional Trials) statement aims to improve the completeness of clinical trial protocol reporting, by providing evidence-based ...recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there is a growing recognition that interventions involving artificial intelligence need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes.The SPIRIT-AI extension is a new reporting guideline for clinical trials protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI. Both guidelines were developed using a staged consensus process, involving a literature review and expert consultation to generate 26 candidate items, which were consulted on by an international multi-stakeholder group in a 2-stage Delphi survey (103 stakeholders), agreed on in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants).The SPIRIT-AI extension includes 15 new items, which were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations around the handling of input and output data, the human-AI interaction and analysis of error cases.SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer-reviewers, as well as the general readership, to understand, interpret and critically appraise the design and risk of bias for a planned clinical trial.
Reporting guidelines are structured tools developed using explicit methodology that specify the minimum information required by researchers when reporting a study. The use of artificial intelligence ...(AI) reporting guidelines that address potential sources of bias specific to studies involving AI interventions has the potential to improve the quality of AI studies, through improvements in their design and delivery, and the completeness and transparency of their reporting. With a number of guidance documents relating to AI studies emerging from different specialist societies, this Review article provides researchers with some key principles for selecting the most appropriate reporting guidelines for a study involving an AI intervention. As the main determinants of a high‐quality study are contained within the methodology of the study design rather than the intervention, researchers are recommended to use reporting guidelines that are specific to the study design, and then supplement them with AI‐specific guidance contained within available AI reporting guidelines.
This review article considers the rising demand for patient-reported outcome measures (PROMs) in modern ophthalmic research and clinical practice. We review what PROMs are, how they are developed and ...chosen for use, and how their quality can be critically appraised. We outline the progress made to develop PROMs in each clinical subspecialty. We highlight recent examples of the use of PROMs as secondary outcome measures in randomized controlled clinical trials and consider the impact they have had. With increasing interest in using PROMs as primary outcome measures, particularly where interventions have been found to be of equivalent efficacy by traditional outcome metrics, we highlight the importance of instrument precision in permitting smaller sample sizes to be recruited. Our review finds that while there has been considerable progress in PROM development, particularly in cataract, glaucoma, medical retina, and low vision, there is a paucity of useful tools for less common ophthalmic conditions. Development and validation of item banks, administered using computer adaptive testing, has been proposed as a solution to overcome many of the traditional limitations of PROMs, but further work will be needed to examine their acceptability to patients, clinicians, and investigators.
Uveitis is a major cause of sight loss across the world. The reliable assessment of intraocular inflammation in uveitis ('disease activity') is essential in order to score disease severity and ...response to treatment. In this review, we describe how 'quantitative imaging', the approach of using automated analysis and measurement algorithms across both standard and emerging imaging modalities, can develop objective instrument-based measures of disease activity.
This is a narrative review based on searches of the current world literature using terms related to quantitative imaging techniques in uveitis, supplemented by clinical trial registry data, and expert knowledge of surrogate endpoints and outcome measures in ophthalmology.
Current measures of disease activity are largely based on subjective clinical estimation, and are relatively insensitive, with poor discrimination and reliability. The development of quantitative imaging in uveitis is most established in the use of optical coherence tomographic (OCT) measurement of central macular thickness (CMT) to measure severity of macular edema (ME). The transformative effect of CMT in clinical assessment of patients with ME provides a paradigm for the development and impact of other forms of quantitative imaging. Quantitative imaging approaches are now being developed and validated for other key inflammatory parameters such as anterior chamber cells, vitreous haze, retinovascular leakage, and chorioretinal infiltrates.
As new forms of quantitative imaging in uveitis are proposed, the uveitis community will need to evaluate these tools against the current subjective clinical estimates and reach a new consensus for how disease activity in uveitis should be measured. The development, validation, and adoption of sensitive and discriminatory measures of disease activity is an unmet need that has the potential to transform both drug development and routine clinical care for the patient with uveitis.
Uveitis describes a group of conditions characterised by intraocular inflammation. The term uveitis technically describes inflammation of the uvea which comprises the iris, ciliary body and choroid, ...however now encompasses inflammation of adjacent intraocular structures such as the retina, vitreous and optic nerve. Uveitis is a significant cause of blindness worldwide, but its impact is generally underappreciated due to a lack of awareness and understanding of the condition among the public and most non-ophthalmic healthcare professionals. In this review, we provide an introduction to uveitis for the non-specialist, outlining the clinical presentations that should raise the suspicion of the disease, the signs that should be looked for and a framework in which to understand the condition. We show how a logical approach to classifying uveitis by aetiology and anatomical focus of disease provides the basis for treatment strategies (drug and route of administration) and clinical presentation and prognosis. We also show why understanding uveitis is helpful to clinicians working in almost every speciality due to the wide-ranging associations with systemic disease.