Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods ...The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 913 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models.
RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on January 19, 2018.
Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism ...(PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model.
RSNA, 2017 Online supplemental material is available for this article.
As the role of artificial intelligence (AI) in clinical practice evolves, governance structures oversee the implementation, maintenance, and monitoring of clinical AI algorithms to enhance quality, ...manage resources, and ensure patient safety. In this article, a framework is established for the infrastructure required for clinical AI implementation and presents a road map for governance. The road map answers four key questions: Who decides which tools to implement? What factors should be considered when assessing an application for implementation? How should applications be implemented in clinical practice? Finally, how should tools be monitored and maintained after clinical implementation? Among the many challenges for the implementation of AI in clinical practice, devising flexible governance structures that can quickly adapt to a changing environment will be essential to ensure quality patient care and practice improvement objectives.
Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and ...variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament ACL tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation.
Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 41.5% female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts.
Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To successfully develop a department-wide standardized structured reporting program and achieve widespread adoption throughout the radiology department.
A structured reporting work group was formed ...in February 2010 to oversee development of standardized structured reports for a radiology department of 36 radiologists at a tertiary care children's hospital. The committee reached consensus on report organization and provided written guidelines and checklists for division representatives to aid in creation of the structured reports. Report drafts were reviewed by a subcommittee and revised until agreement was reached with the report author. Each report was vetted by all radiologists who would be using the report, and further revisions were made, as appropriate. Reports were then entered into the speech recognition system so that each report was associated with a procedure code or a group of codes from the radiology information system. This enabled automatic report population within the speech recognition system. The initiative was completed by September 2011. Quarterly audits were performed to evaluate for adherence to the standard report format and use of the normal report in cases in which the radiologist believed the study was normal. In August 2012, radiologists were surveyed as to their impressions of the structured reporting program.
A total of 228 standardized structured reports were created within 2 years after initiation of the project, corresponding to 199,000 (94%) of 212,000 departmental studies by volume. By the end of the implementation period in September 2011, all 223 (100%) audited reports adhered to the standard report format and 80 (99%) of 81 reports adhered to the normal report. Radiologist feedback was largely favorable.
Standardized department-wide structured reporting can be implemented in a radiology department, with a high rate of adoption by the radiologists.
In this article, the authors propose an ethical framework for using and sharing clinical data for the development of artificial intelligence (AI) applications. The philosophical premise is as ...follows: when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, clinical data should be treated as a form of public good, to be used for the benefit of future patients. In their 2013 article, Faden et al argued that all who participate in the health care system, including patients, have a moral obligation to contribute to improving that system. The authors extend that framework to questions surrounding the secondary use of clinical data for AI applications. Specifically, the authors propose that all individuals and entities with access to clinical data become data stewards, with fiduciary (or trust) responsibilities to patients to carefully safeguard patient privacy, and to the public to ensure that the data are made widely available for the development of knowledge and tools to benefit future patients. According to this framework, the authors maintain that it is unethical for providers to "sell" clinical data to other parties by granting access to clinical data, especially under exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. The authors also propose that patient consent is not required before the data are used for secondary purposes when obtaining such consent is prohibitively costly or burdensome, as long as mechanisms are in place to ensure that ethical standards are strictly followed. Rather than debate whether patients or provider organizations "own" the data, the authors propose that clinical data are not owned at all in the traditional sense, but rather that all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and society.
To identify nationwide trends and factors associated with the use of computed tomography (CT) in the emergency department (ED).
This study was exempt from institutional review board approval. Data ...from the 1995-2007 National Hospital Ambulatory Medical Care Survey were used to evaluate the numbers and percentages of ED visits associated with CT. A mean of 30 044 visits were sampled each year. Data were also subcategorized according to multiple patient and hospital characteristics. The Rao-Scott χ(2) test was performed to determine whether CT use was similar across subpopulations. Data were evaluated according to exponential and logistic growth models.
From 1995 to 2007, the number of ED visits that included a CT examination increased from 2.7 million to 16.2 million, constituting a 5.9-fold increase and a compound annual growth rate of 16.0%. The percentage of visits associated with CT increased from 2.8% to 13.9%, constituting a 4.9-fold increase and a compound annual growth rate of 14.2%. The exponential growth model provided the best fit for the trend in CT use. CT use was greater in older patients, white patients, patients admitted to the hospital, and patients at facilities in metropolitan regions. By the end of the study period, the top chief complaints among those who underwent CT were abdominal pain, headache, and chest pain. The percentage of patient visits associated with CT for all evaluated chief complaints increased-most substantially among those who underwent CT for flank, abdominal, or chest pain.
Use of CT has increased at a higher rate in the ED than in other settings. The overall use of CT had not begun to taper by 2007.
Cryptococcal meningitis accounts for 15% of AIDS-related mortality. Cryptococcal antigen (CrAg) is detected in blood weeks before onset of meningitis, and CrAg positivity is an independent predictor ...of meningitis and death. CrAg screening for patients with advanced HIV and preemptive treatment is recommended by the World Health Organization, though implementation remains limited. Our objective was to evaluate costs and mortality reduction (lives saved) from a national CrAg screening program across Uganda.
We created a decision analytic model to evaluate CrAg screening. CrAg screening was considered for those with a CD4<100 cells/μL per national and international guidelines, and in the context of a national HIV test-and-treat program where CD4 testing was not available. Costs (2016 USD) were estimated for screening, preemptive therapy, hospitalization, and maintenance therapy. Parameter assumptions were based on large prospective CrAg screening studies in Uganda, and clinical trials from sub Saharan Africa. CrAg positive (CrAg+) persons could be: (a) asymptomatic and thus eligible for preemptive treatment with fluconazole; or (b) symptomatic with meningitis with hospitalization.
In the base case model for 1 million persons with a CD4 test annually, 128,000 with a CD4<100 cells/μL were screened, and 8,233 were asymptomatic CrAg+ and received preemptive therapy. Compared to no screening and treatment, CrAg screening and treatment in the base case cost $3,356,724 compared to doing nothing, and saved 7,320 lives, for a cost of $459 per life saved, with the $3.3 million in cost savings derived from fewer patients developing fulminant meningitis. In the scenario of a national HIV test-and-treat program, of 1 million HIV-infected persons, 800,000 persons were screened, of whom 640,000 returned to clinic, and 8,233 were incident CrAg positive (CrAg prevalence 1.4%). The total cost of a CrAg screening and treatment program was $4.16 million dollars, with 2,180 known deaths. Conversely, without CrAg screening, the cost of treating meningitis was $3.09 million dollars with 3,806 deaths. Thus, despite the very low CrAg prevalence of 1.4% in the general HIV-infected population, and inadequate retention-in-care, CrAg screening averted 43% of deaths from cryptococcal meningitis at a cost of $662 per death averted.
CrAg screening and treatment programs are cost-saving and lifesaving, assuming preemptive treatment is 77% effective in preventing death, and could be adopted and implemented by ministries of health to reduce mortality in those with advanced HIV disease. Even within HIV test-and-treat programs where CD4 testing is not performed, and CrAg prevalence is only 1.4%, CrAg screening is cost-effective.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK