To systematically examine the design, reporting standards, risk of bias, and claims of studies comparing the performance of diagnostic deep learning algorithms for medical imaging with that of expert ...clinicians.
Systematic review.
Medline, Embase, Cochrane Central Register of Controlled Trials, and the World Health Organization trial registry from 2010 to June 2019.
Randomised trial registrations and non-randomised studies comparing the performance of a deep learning algorithm in medical imaging with a contemporary group of one or more expert clinicians. Medical imaging has seen a growing interest in deep learning research. The main distinguishing feature of convolutional neural networks (CNNs) in deep learning is that when CNNs are fed with raw data, they develop their own representations needed for pattern recognition. The algorithm learns for itself the features of an image that are important for classification rather than being told by humans which features to use. The selected studies aimed to use medical imaging for predicting absolute risk of existing disease or classification into diagnostic groups (eg, disease or non-disease). For example, raw chest radiographs tagged with a label such as pneumothorax or no pneumothorax and the CNN learning which pixel patterns suggest pneumothorax.
Adherence to reporting standards was assessed by using CONSORT (consolidated standards of reporting trials) for randomised studies and TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) for non-randomised studies. Risk of bias was assessed by using the Cochrane risk of bias tool for randomised studies and PROBAST (prediction model risk of bias assessment tool) for non-randomised studies.
Only 10 records were found for deep learning randomised clinical trials, two of which have been published (with low risk of bias, except for lack of blinding, and high adherence to reporting standards) and eight are ongoing. Of 81 non-randomised clinical trials identified, only nine were prospective and just six were tested in a real world clinical setting. The median number of experts in the comparator group was only four (interquartile range 2-9). Full access to all datasets and code was severely limited (unavailable in 95% and 93% of studies, respectively). The overall risk of bias was high in 58 of 81 studies and adherence to reporting standards was suboptimal (<50% adherence for 12 of 29 TRIPOD items). 61 of 81 studies stated in their abstract that performance of artificial intelligence was at least comparable to (or better than) that of clinicians. Only 31 of 81 studies (38%) stated that further prospective studies or trials were required.
Few prospective deep learning studies and randomised trials exist in medical imaging. Most non-randomised trials are not prospective, are at high risk of bias, and deviate from existing reporting standards. Data and code availability are lacking in most studies, and human comparator groups are often small. Future studies should diminish risk of bias, enhance real world clinical relevance, improve reporting and transparency, and appropriately temper conclusions.
PROSPERO CRD42019123605.
The CONSORT 2010 statement provides minimum guidelines for reporting randomized trials. Its widespread use has been instrumental in ensuring transparency in the evaluation of new interventions. More ...recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a two-day consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The CONSORT-AI extension includes 14 new items that were considered sufficiently important for AI interventions that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and provision of an analysis of error cases. CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.
The SPIRIT 2013 statement aims to improve the completeness of clinical trial protocol reporting by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance ...has been instrumental in promoting transparent evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 26 candidate items, which were consulted upon by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The SPIRIT-AI extension includes 15 new items that were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations for the handling of input and output data, the human-AI interaction and analysis of error cases. SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the design and risk of bias for a planned clinical trial.
The arrival of artificially intelligent systems into the domain of medical imaging has focused attention and sparked much debate on the role and responsibilities of the radiologist. However, ...discussion about the impact of such technology on the radiographer role is lacking. This paper discusses the potential impact of artificial intelligence (AI) on the radiography profession by assessing current workflow and cross-mapping potential areas of AI automation such as procedure planning, image acquisition and processing. We also highlight the opportunities that AI brings including enhancing patient-facing care, increased cross-modality education and working, increased technological expertise and expansion of radiographer responsibility into AI-supported image reporting and auditing roles.
The CONSORT 2010 statement provides minimum guidelines for reporting randomised trials. Its widespread use has been instrumental in ensuring transparency in the evaluation of new interventions. More ...recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a two-day consensus meeting (31 stakeholders), and refined through a checklist pilot (34 participants). The CONSORT-AI extension includes 14 new items that were considered sufficiently important for AI interventions that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human–AI interaction and provision of an analysis of error cases. CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer reviewers, as well as the general readership, to understand, interpret, and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.
The SPIRIT 2013 statement aims to improve the completeness of clinical trial protocol reporting by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance ...has been instrumental in promoting transparent evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 26 candidate items, which were consulted upon by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The SPIRIT-AI extension includes 15 new items that were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations for the handling of input and output data, the human–AI interaction and analysis of error cases. SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer reviewers, as well as the general readership, to understand, interpret, and critically appraise the design and risk of bias for a planned clinical trial.
Mobile apps are the primary means by which consumers access digital health and wellness software, with delivery dominated by the 'Apple App Store' and the 'Google Play Store'. Through these virtual ...storefronts Apple and Google act as the distributor (and sometimes, importer) of many thousands of health and wellness apps into the EU, some of which have a medical purpose. As a result of changes to EU law which came into effect in May 2021, they must now ensure that apps are compliant with medical devices regulation and to inform authorities of serious incidents arising from their use. The extent to which these new rules are being complied with in practice is uneven, and in some areas unclear. In light of EU legislation related to competition, which came into effect in November 2022, it is also unclear how conflicts of interest can be managed between Apple and Google's roles as gateway duopoly importers and distributors whilst also developers of their own competitive health products. Finally, with the proposed European health data space regulation, wellness apps will be voluntarily registered and labelled in a fashion more like medical devices than consumer software. We explore the implications of these new regulations and propose future models that could resolve the apparent conflicts. All stakeholders would benefit from improved app store models to sustainably evolve safer, better, and fairer provision of digital health applications in the EU. As EU legislation comes into force it could serve as a template for other regions globally.
Background
Intended use statements (IUSs) are mandatory to obtain regulatory clearance for artificial intelligence (AI)-based medical devices in the European Union. In order to guide the safe use of ...AI-based medical devices, IUSs need to contain comprehensive and understandable information. This study analyzes the IUSs of CE-marked AI products listed on AIforRadiology.com for ambiguity and completeness.
Methods
We retrieved 157 IUSs of CE-marked AI products listed on AIforRadiology.com in September 2022. Duplicate products (
n
= 1), discontinued products (
n
= 3), and duplicate statements (
n
= 14) were excluded. The resulting IUSs were assessed for the presence of 6 items: medical indication, part of the body, patient population, user profile, use environment, and operating principle. Disclaimers, defined as contra-indications or warnings in the IUS, were identified and compared with claims.
Results
Of 139 AI products, the majority (
n
= 78) of IUSs mentioned 3 or less items. IUSs of only 7 products mentioned all 6 items. The intended body part (
n
= 115) and the operating principle (
n
= 116) were the most frequently mentioned components, while the intended use environment (
n
= 24) and intended patient population (
n
= 29) were mentioned less frequently. Fifty-six statements contained disclaimers that conflicted with the claims in 13 cases.
Conclusion
The majority of IUSs of CE-marked AI-based medical devices lack substantial information and, in few cases, contradict the claims of the product.
Critical relevance statement
To ensure correct usage and to avoid off-label use or foreseeable misuse of AI-based medical devices in radiology, manufacturers are encouraged to provide more comprehensive and less ambiguous intended use statements.
Key points
• Radiologists must know AI products’ intended use to avoid off-label use or misuse.
• Ninety-five percent (
n
= 132/139) of the intended use statements analyzed were incomplete.
• Nine percent (
n
= 13) of the intended use statements held disclaimers contradicting the claim of the AI product.
• Manufacturers and regulatory bodies must ensure that intended use statements are comprehensive.
Graphical Abstract
The US Food and Drug Administration's (FDA) 510(k) regulatory pathway allows medical device manufacturers to identify and leverage an already marketed device to apply for premarket clearance of their ...own substantially equivalent device. No two developers will ever produce the same lines of code, let alone two identical AI models. Since AI relies on labeled data inputs to model a latent space of knowledge, it is also obvious that no two AI systems trained on different input data sets could ever be identical. Even the FDA's own guidance defines substantial equivalence as "no significant change in the ... design or other features of the device from those of the predicate device," which clearly is not the case with AI systems. Since substantial equivalence of AI systems cannot be reliably proven through technological characteristics, then what about safety and effectiveness? A mountain of evidence suggests that no two AI models perform the same,5 individual models perform differently in different locations,6 and are as impacted by performance drift and variability once deployed over diverse populations.7 Therefore, effectiveness is also not a reliable indicator of equivalence.
Since its inception in 2017,
has attracted a disproportionate number of manuscripts reporting on uses of artificial intelligence. This field has matured rapidly in the past several years. There was ...initial fascination with the algorithms themselves (machine learning, deep learning, convoluted neural networks) and the use of these algorithms to make predictions that often surpassed prevailing benchmarks. As the discipline has matured, individuals have called attention to aberrancies in the output of these algorithms. In particular, criticisms have been widely circulated that algorithmically developed models may have limited generalizability due to overfitting to the training data and may systematically perpetuate various forms of biases inherent in the training data, including race, gender, age, and health state or fitness level (Challen et al. BMJ Qual. Saf. 28:231-237, 2019; O'neil. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Broadway Book, 2016). Given our interest in publishing the highest quality papers and the growing volume of submissions using AI algorithms, we offer a list of criteria that authors should consider before submitting papers to
.