The medical algorithmic audit Liu, Xiaoxuan; Glocker, Ben; McCradden, Melissa M ...
The Lancet. Digital health,
20/May , Letnik:
4, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Artificial intelligence systems for health care, like any other medical device, have the potential to fail. However, specific qualities of artificial intelligence systems, such as the tendency to ...learn spurious correlates in training data, poor generalisability to new deployment settings, and a paucity of reliable explainability mechanisms, mean they can yield unpredictable errors that might be entirely missed without proactive investigation. We propose a medical algorithmic audit framework that guides the auditor through a process of considering potential algorithmic errors in the context of a clinical task, mapping the components that might contribute to the occurrence of errors, and anticipating their potential consequences. We suggest several approaches for testing algorithmic errors, including exploratory error analysis, subgroup testing, and adversarial testing, and provide examples from our own work and previous studies. The medical algorithmic audit is a tool that can be used to better understand the weaknesses of an artificial intelligence system and put in place mechanisms to mitigate their impact. We propose that safety monitoring and medical algorithmic auditing should be a joint responsibility between users and developers, and encourage the use of feedback mechanisms between these groups to promote learning and maintain safe deployment of artificial intelligence systems.
Introduction
This study assessed replacing traditional protocol CT‐arterial chest and venous abdomen and pelvis, with a single‐pass, single‐bolus, venous phase CT chest, abdomen and pelvis (CAP) ...protocol in general oncology outpatients at a single centre.
Methods
A traditional protocol is an arterial phase chest followed by venous phase abdomen and pelvis. A venous CAP (vCAP) protocol is a single acquisition 60 s after contrast injection, with optional arterial phase upper abdomen based on the primary tumour. Consecutive eligible patients were assessed, using each patient's prior study as a comparator. Attenuation for various structures, lesion conspicuity and dose were compared. Subset analysis of dual‐energy (DE) CT scans in the vCAP protocol performed for lesion conspicuity on 50 keV virtual monoenergetic (VME) images.
Results
One hundred and eleven patients were assessed with both protocols. Forty‐six patients had their vCAP scans using DECT. The vCAP protocol had no significant difference in the attenuation of abdominal structures, with reduced attenuation of mediastinal structures. There was a significant improvement in the visibility of pleural lesions (p < 0.001), a trend for improved mediastinal nodes assessment, and no significant difference for abdominal lesions. A significant increase in liver lesion conspicuity on 50 keV VME reconstructions was noted for both readers (p < 0.001). There were significant dose reductions with the vCAP protocol.
Conclusion
A single‐pass vCAP protocol offered an improved thoracic assessment with no loss of abdominal diagnostic confidence and significant dose reductions compared to traditional protocol. Improved liver lesion conspicuity on 50 keV VME images across a range of cancers is promising.
Summary
The inclusion and celebration of LGBTQIA+ staff in radiology and radiation oncology departments is crucial in developing a diverse and thriving workplace. Despite the substantial social ...change in Australia, LGBTQIA+ people still experience harassment and exclusion, negatively impacting their well‐being and workplace productivity. We need to be proactive in creating policies that are properly implemented and translate to a safe and inclusive space for marginalised groups. In this work, we outline the role we all can play in creating inclusive environments, for both individuals and leaders working in radiology and radiation oncology. We can learn how to avoid normative assumptions about gender and sexuality, respect people's identities and speak out against witnessed discrimination or slights. Robust policies are needed to protect LGBTQIA+ members from discrimination and provide equal access across other pertinent parts of work life such as leave entitlements, representation in data collection and safe bathroom access. We all deserve to feel safe and respected at work and further effort is needed to ensure this extends to LGBTQIA+ staff in the radiology and radiation oncology workforces.
Summary
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the ...potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‐growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‐society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Machine learning may assist in medical student evaluation. This study involved scoring short answer questions administered at three centres. Bidirectional encoder representations from transformers ...were particularly effective for professionalism question scoring (accuracy ranging from 41.6% to 92.5%). In the scoring of 3‐mark professionalism questions, as compared with clinical questions, machine learning had a lower classification accuracy (P < 0.05). The role of machine learning in medical professionalism evaluation warrants further investigation.
Rheumatoid arthritis is an autoimmune condition that predominantly affects the synovial joints, causing joint destruction, pain, and disability. Historically, the standard for measuring the long-term ...efficacy of disease-modifying antirheumatic drugs has been the assessment of plain radiographs with scoring techniques that quantify joint damage. However, with significant improvements in therapy, current radiographic scoring systems may no longer be fit for purpose for the milder spectrum of disease seen today. We argue that artificial intelligence is an apt solution to further improve upon radiographic scoring, as it can readily learn to recognize subtle patterns in imaging data to not only improve efficiency, but can also increase the sensitivity to variation in mild disease. Current work in the area demonstrates the feasibility of automating scoring but is yet to take full advantage of the strengths of artificial intelligence. By fully leveraging the power of artificial intelligence, faster and more sensitive scoring could enable the ongoing development of effective treatments for patients with rheumatoid arthritis.
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential ...to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones.
This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Key points
• The incorporation of artificial intelligence (AI) in radiological practice demands increased monitoring of its utility and safety.
• Cooperation between developers, clinicians, and regulators will allow all involved to address ethical issues and monitor AI performance.
• AI can fulfil its promise to advance patient well-being if all steps from development to integration in healthcare are rigorously evaluated.
Introduction
Machine learning (ML) methods are being increasingly applied to prognostic prediction for stroke patients with large vessel occlusion (LVO) treated with endovascular thrombectomy. This ...systematic review aims to summarize ML-based pre-thrombectomy prognostic models for LVO stroke and identify key research gaps.
Methods
Literature searches were performed in Embase, PubMed, Web of Science, and Scopus. Meta-analyses of the area under the receiver operating characteristic curves (AUCs) of ML models were conducted to synthesize model performance.
Results
Sixteen studies describing 19 models were eligible. The predicted outcomes include functional outcome at 90 days, successful reperfusion, and hemorrhagic transformation. Functional outcome was analyzed by 10 conventional ML models (pooled AUC=0.81, 95% confidence interval CI: 0.77–0.85, AUC range: 0.68–0.93) and four deep learning (DL) models (pooled AUC=0.75, 95% CI: 0.70–0.81, AUC range: 0.71–0.81). Successful reperfusion was analyzed by three conventional ML models (pooled AUC=0.72, 95% CI: 0.56–0.88, AUC range: 0.55–0.88) and one DL model (AUC=0.65, 95% CI: 0.62–0.68).
Conclusions
Conventional ML and DL models have shown variable performance in predicting post-treatment outcomes of LVO without generally demonstrating superiority compared to existing prognostic scores. Most models were developed using small datasets, lacked solid external validation, and at high risk of potential bias. There is considerable scope to improve study design and model performance. The application of ML and DL methods to improve the prediction of prognosis in LVO stroke, while promising, remains nascent.
Systematic review registration
https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021266524
, identifier CRD42021266524
Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would ...be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images.
Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race.
In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging area under the receiver operating characteristics curve (AUC) range 0·91–0·99, CT chest imaging 0·87–0·96, and mammography 0·81). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index AUC 0·55, disease distribution 0·61, and breast density 0·61). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study.
The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.
National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology
A growing number of artificial intelligence (AI)-based clinical decision support systems are showing promising performance in preclinical, in silico evaluation, but few have yet demonstrated real ...benefit to patient care. Early-stage clinical evaluation is important to assess an AI system's actual clinical performance at small scale, ensure its safety, evaluate the human factors surrounding its use and pave the way to further large-scale trials. However, the reporting of these early studies remains inadequate. The present statement provides a multi-stakeholder, consensus-based reporting guideline for the Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence (DECIDE-AI). We conducted a two-round, modified Delphi process to collect and analyze expert opinion on the reporting of early clinical evaluation of AI systems. Experts were recruited from 20 pre-defined stakeholder categories. The final composition and wording of the guideline was determined at a virtual consensus meeting. The checklist and the Explanation & Elaboration (E&E) sections were refined based on feedback from a qualitative evaluation process. In total, 123 experts participated in the first round of Delphi, 138 in the second round, 16 in the consensus meeting and 16 in the qualitative evaluation. The DECIDE-AI reporting guideline comprises 17 AI-specific reporting items (made of 28 subitems) and ten generic reporting items, with an E&E paragraph provided for each. Through consultation and consensus with a range of stakeholders, we developed a guideline comprising key items that should be reported in early-stage clinical studies of AI-based decision support systems in healthcare. By providing an actionable checklist of minimal reporting items, the DECIDE-AI guideline will facilitate the appraisal of these studies and replicability of their findings.