To estimate the frequency of distorted presentation and overinterpretation of results in diagnostic accuracy studies.
MEDLINE was searched for diagnostic accuracy studies published between January ...and June 2010 in journals with an impact factor of 4 or higher. Articles included were primary studies of the accuracy of one or more tests in which the results were compared with a clinical reference standard. Two authors scored each article independently by using a pretested data-extraction form to identify actual overinterpretation and practices that facilitate overinterpretation, such as incomplete reporting of study methods or the use of inappropriate methods (potential overinterpretation). The frequency of overinterpretation was estimated in all studies and in a subgroup of imaging studies.
Of the 126 articles, 39 (31%; 95% confidence interval CI: 23, 39) contained a form of actual overinterpretation, including 29 (23%; 95% CI: 16, 30) with an overly optimistic abstract, 10 (8%; 96% CI: 3%, 13%) with a discrepancy between the study aim and conclusion, and eight with conclusions based on selected subgroups. In our analysis of potential overinterpretation, authors of 89% (95% CI: 83%, 94%) of the studies did not include a sample size calculation, 88% (95% CI: 82%, 94%) did not state a test hypothesis, and 57% (95% CI: 48%, 66%) did not report CIs of accuracy measurements. In 43% (95% CI: 34%, 52%) of studies, authors were unclear about the intended role of the test, and in 3% (95% CI: 0%, 6%) they used inappropriate statistical tests. A subgroup analysis of imaging studies showed 16 (30%; 95% CI: 17%, 43%) and 53 (100%; 95% CI: 92%, 100%) contained forms of actual and potential overinterpretation, respectively.
Overinterpretation and misreporting of results in diagnostic accuracy studies is frequent in journals with high impact factors.
http://radiology.rsna.org/lookup/suppl/doi:10.1148/radiol.12120527/-/DC1.
Invasive aspergillosis (IA) is a life-threatening opportunistic mycosis that occurs in some people with a compromised immune system. The serum galactomannan enzyme-linked immunosorbent assay (ELISA) ...rapidly gained widespread acceptance as part of the diagnostic work-up of a patient suspected of IA. Due to its non-invasive nature, it can be used as a routine screening test. The ELISA can also be performed on bronchoalveolar lavage (BAL), allowing sampling of the immediate vicinity of the infection. The invasive nature of acquiring BAL, however, changes the role of the galactomannan test significantly, for example by precluding its use as a routine screening test.
To assess the diagnostic accuracy of galactomannan detection in BAL for the diagnosis of IA in people who are immunocompromised, at different cut-off values for test positivity, in accordance with the Cochrane Diagnostic Test Accuracy Handbook.
We searched three bibliographic databases including MEDLINE on 9 September 2016 for aspergillosis and galactomannan as text words and subject headings where appropriate. We checked reference lists of included studies for additional studies.
We included cohort studies that examined the accuracy of BAL galactomannan for the diagnosis of IA in immunocompromised patients if they used the European Organization for Research and Treatment of Cancer/Invasive Fungal Infections Cooperative Group and the National Institute of Allergy and Infectious Diseases Mycoses Study Group (EORTC/MSG) classification as reference standard.
Two review authors assessed study quality and extracted data. Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) was used for quality assessment.
We included 17 studies in our review. All studies except one had a high risk of bias in two or more domains. The diagnostic performance of an optical density index (ODI) of 0.5 as cut-off value was reported in 12 studies (with 1123 patients). The estimated sensitivity was 0.88 (95% confidence interval (CI) 0.75 to 1.00) and specificity 0.81 (95% CI 0.71 to 0.91). The performance of an ODI of 1.0 as cut-off value could be determined in 11 studies (with 648 patients). The sensitivity was 0.78 (95% CI 0.61 to 0.95) and specificity 0.93 (95% CI 0.87 to 0.98). At a cut-off ODI of 1.5 or higher, the heterogeneity in specificity decreased significantly and was invariably >90%.
The optimal cut-off value depends on the local incidence and clinical pathway. At a prevalence of 12% a hypothetical population of 1000 patients will consist of 120 patients with IA. At a cut-off value of 0.5 14 patients with IA will be missed and there will be 167 patients incorrectly diagnosed with IA. If we use the test at a cut-off value of 1.0, we will miss 26 patients with IA. And there will be 62 patients incorrectly diagnosed with invasive aspergillosis. The populations and results were very heterogeneous. Therefore, interpretation and extrapolation of these results has to be performed with caution. A test result of 1.5 ODI or higher appears a strong indicator of IA.
Background
Primary sclerosing cholangitis (PSC) is a slowly progressive liver disease. Reliable biomarkers to predict outcome are urgently needed to serve as surrogate endpoints and/or stratifiers in ...clinical trials. Reduction in serum alkaline phosphatase (ALP) has been proposed as prognostic surrogate marker in PSC. The aim of this study was to asses if ALP at diagnosis (T0), 1 year later (T1), and percentage change between both time points hold prognostic value, and to determine the optimal threshold.
Methods
We retrospectively collected ALP levels at T0 and T1 for patients included in a large PSC cohort. The association of ALP at T0, T1, and percentage change with the combined endpoint (PSC‐related death, liver transplantation) was analysed. Predictive value was determined using C‐statistics.
Results
A total of 366 patients were included, of whom 66 (18%) reached an endpoint: 26 (7%) PSC‐related death, 40 (11%) liver transplantation. At T0 and T1, 84% used ursodeoxycholic acid. A positive association was observed between level of ALP at T0 and T1 and the hazard of reaching an endpoint, up to values around 2.5 times upper limit of normal (xULN). A larger decrease in ALP between T0 and T1 decreased the event rate. A range of thresholds (0.5–3×ULN) with about similar C‐statistics was found. In this cohort, the optimal threshold was 1.3×ULN at T1.
Conclusion
ALP can be used to discriminate between PSC patients with a good and a poor prognosis. These findings indicate that ALP can serve as stratifier, and potentially as surrogate endpoint for clinical trials in PSC.
See Editorial on Page 1748
Background Clinical and laboratory diagnosis of cutaneous leishmaniasis (CL) is hampered by under-ascertainment of direct microscopy. Methods This study compared the diagnostic accuracy of qPCR on ...DNA extracted from filter paper to the accuracy of direct smear slide microscopy in participants presenting with a cutaneous lesion suspected of leishmaniasis to 16 rural healthcare centers in the Ecuadorian Amazon and Pacific regions, from January 2019 to June 2021. We used Bayesian latent class analysis to estimate test sensitivity, specificity, likelihood ratios (LR), and predictive values (PV) with their 95% credible intervals (95%CrI). The impact of sociodemographic and clinical characteristics on predictive values was assessed as a secondary objective. Results Of 320 initially included participants, paired valid test results were available and included in the diagnostic accuracy analysis for 129 from the Amazon and 185 from the Pacific region. We estimated sensitivity of 68% (95%CrI 49% to 82%) and 73% (95%CrI 73% to 83%) for qPCR, and 51% (95%CrI 36% to 66%) and 76% (95%CrI 65% to 86%) for microscopy in the Amazon and Pacific region, respectively. In the Amazon, with an estimated disease prevalence among participants of 73%, negative PV for qPCR was 54% (95%CrI 5% to 77%) and 44% (95%CrI 4% to 65%) for microscopy. In the Pacific, (prevalence 88%) the negative PV was 34% (95%CrI 3% to 58%) and 37% (95%CrI 3% to 63%). The addition of qPCR parallel to microscopy in the Amazon increases the observed prevalence from 38% to 64% (+26 (95%CrI 19 to 34) percentage points). Conclusion The accuracy of either qPCR on DNA extracted from filter paper or microscopy for CL diagnosis as a stand-alone test seems to be unsatisfactory and region-dependent. We recommend further studies to confirm the clinically relevant increment found in the diagnostic yield due to the addition of qPCR.
Rapid diagnosis of respiratory virus infections contributes to patient care. This systematic review evaluates the diagnostic accuracy of rapid tests for the detection of respiratory viruses. We ...searched Medline and EMBASE for studies evaluating these tests against polymerase chain reaction as the reference standard. Of 179 studies included, 134 evaluated rapid tests for influenza viruses, 32 for respiratory syncytial virus (RSV), and 13 for other respiratory viruses. We used the bivariate random effects model for quantitative meta-analysis of the results. Most tests detected only influenza viruses or RSV. Summary sensitivity and specificity estimates of tests for influenza were 61.1% and 98.9%. For RSV, summary sensitivity was 75.3%, and specificity, 98.7%. We assessed the quality of studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist. Because of incomplete reporting, the risk of bias was often unclear. Despite their intended use at the point of care, 26.3% of tests were evaluated in a laboratory setting. Although newly developed tests seem more sensitive, high-quality evaluations of these tests are lacking.
Abstract Objective Meta-analysis of predictive values is usually discouraged because these values are directly affected by disease prevalence, but sensitivity and specificity sometimes show ...substantial heterogeneity as well. We propose a bivariate random-effects logitnormal model for the meta-analysis of the positive predictive value (PPV) and negative predictive value (NPV) of diagnostic tests. Study Design and Setting Twenty-three meta-analyses of diagnostic accuracy were reanalyzed. With separate models, we calculated summary estimates of the PPV and NPV and summary estimates of sensitivity and specificity. We compared these summary estimates, the goodness of fit of the two models, and the amount of heterogeneity of both approaches. Results There were no substantial differences in the goodness of fit or amount of heterogeneity between both models. The median absolute difference between the projected PPV and NPV from the summary estimates of sensitivity and specificity and the summary estimates of PPV and NPV was 1% point (interquartile range, 0–2% points). Conclusion A model for the meta-analysis of predictive values fitted the data from a range of systematic reviews equally well as meta-analysis of sensitivity and specificity. The choice for either model could be guided by considerations of the design used in the primary studies and sources of heterogeneity.
To provide guidance on rating imprecision in a body of evidence assessing the accuracy of a single test. This guide will clarify when Grading of Recommendations Assessment, Development and Evaluation ...(GRADE) users should consider rating down the certainty of evidence by one or more levels for imprecision in test accuracy.
A project group within the GRADE working group conducted iterative discussions and presentations at GRADE working group meetings to produce this guidance.
Before rating the certainty of evidence, GRADE users should define the target of their certainty rating. GRADE recommends setting judgment thresholds defining what they consider a very accurate, accurate, inaccurate, and very inaccurate test. These thresholds should be set after considering consequences of testing and effects on people-important outcomes. GRADE's primary criterion for judging imprecision in test accuracy evidence is considering confidence intervals (i.e., CI approach) of absolute test accuracy results (true and false, positive, and negative results in a cohort of people). Based on the CI approach, when a CI appreciably crosses the predefined judgment threshold(s), one should consider rating down certainty of evidence by one or more levels, depending on the number of thresholds crossed. When the CI does not cross judgment threshold(s), GRADE suggests considering the sample size for an adequately powered test accuracy review (optimal or review information size optimal information size (OIS)/review information size (RIS)) in rating imprecision. If the combined sample size of the included studies in the review is smaller than the required OIS/RIS, one should consider rating down by one or more levels for imprecision.
This paper extends previous GRADE guidance for rating imprecision in single test accuracy systematic reviews and guidelines, with a focus on the circumstances in which one should consider rating down one or more levels for imprecision.
Embase is a biomedical and pharmacological bibliographic database of published literature, produced by Elsevier. In 2011, Embase introduced the Emtree term “diagnostic test accuracy study,” after ...discussion with the diagnostic test accuracy (DTA) community of Cochrane. The aim of this study is to investigate the performance of this Emtree term when used to retrieve diagnostic accuracy studies.
We first piloted a random selection of 1,000 titles from Embase and then repeated the process with 1,223 studies specifically limited to humans. Two researchers independently screened those for eligibility. From titles that were indicated as being relevant or potentially relevant by at least one assessor, the full texts were retrieved and screened. A third researcher retrieved the Emtree terms for each title and checked whether “diagnostic test accuracy study” was one of the attached Emtree terms. The results of both exercises were then cross-classified, and sensitivity and specificity of the Emtree term were estimated.
Our pilot set consisted of 1,000 studies, of which 20 (2.0%) were studies from which DTA data could be extracted. Thirteen studies had the label DTA study, of which five were indeed DTA studies. The final set consisted of 1,223 studies, of which 33 (2.7%) were DTA studies. Twenty studies were labeled as DTA study, of which fourteen indeed were DTA studies. This resulted in a sensitivity of 42.4% (95% CI: 25.5% to 60.8%) and a specificity of 99.5% (95% CI: 98.9% to 99.8%).
Although we planned to include a more focused set of studies in our second attempt, the percentage of DTA studies was similar in both attempts. The DTA label failed to retrieve most of the DTA studies and 30% of the studies labeled as being DTA study were in fact not DTA studies. The Emtree term DTA study does not meet the requirements to be useful for retrieving DTA studies accurately.
The diagnosis of infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) presents major challenges. Reverse transcriptase polymerase chain reaction (RT-PCR) testing is used to ...diagnose a current infection, but its utility as a reference standard is constrained by sampling errors, limited sensitivity (71% to 98%), and dependence on the timing of specimen collection. Chest imaging tests are being used in the diagnosis of COVID-19 disease, or when RT-PCR testing is unavailable.
To determine the diagnostic accuracy of chest imaging (computed tomography (CT), X-ray and ultrasound) in people with suspected or confirmed COVID-19.
We searched the COVID-19 Living Evidence Database from the University of Bern, the Cochrane COVID-19 Study Register, and The Stephen B. Thacker CDC Library. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions. We conducted searches for this review iteration up to 5 May 2020.
We included studies of all designs that produce estimates of test accuracy or provide data from which estimates can be computed. We included two types of cross-sectional designs: a) where all patients suspected of the target condition enter the study through the same route and b) where it is not clear up front who has and who does not have the target condition, or where the patients with the target condition are recruited in a different way or from a different population from the patients without the target condition. When studies used a variety of reference standards, we included all of them.
We screened studies and extracted data independently, in duplicate. We also assessed the risk of bias and applicability concerns independently, in duplicate, using the QUADAS-2 checklist and presented the results of estimated sensitivity and specificity, using paired forest plots, and summarised in tables. We used a hierarchical meta-analysis model where appropriate. We presented uncertainty of the accuracy estimates using 95% confidence intervals (CIs).
We included 84 studies, falling into two categories: studies with participants with confirmed diagnoses of COVID-19 at the time of recruitment (71 studies with 6331 participants) and studies with participants suspected of COVID-19 (13 studies with 1948 participants, including three case-control studies with 549 cases and controls). Chest CT was evaluated in 78 studies (8105 participants), chest X-ray in nine studies (682 COVID-19 cases), and chest ultrasound in two studies (32 COVID-19 cases). All evaluations of chest X-ray and ultrasound were conducted in studies with confirmed diagnoses only. Twenty-five per cent (21/84) of all studies were available only as preprints, 15/71 studies in the confirmed cases group and 6/13 of the studies in the suspected group. Among 71 studies that included confirmed cases, 41 studies had included symptomatic cases only, 25 studies had included cases regardless of their symptoms, five studies had included asymptomatic cases only, three of which included a combination of confirmed and suspected cases. Seventy studies were conducted in Asia, 2 in Europe, 2 in North America and one in South America. Fifty-one studies included inpatients while the remaining 24 studies were conducted in mixed or unclear settings. Risk of bias was high in most studies, mainly due to concerns about selection of participants and applicability. Among the 13 studies that included suspected cases, nine studies were conducted in Asia, and one in Europe. Seven studies included inpatients while the remaining three studies were conducted in mixed or unclear settings. In studies that included confirmed cases the pooled sensitivity of chest CT was 93.1% (95%CI: 90.2 - 95.0 (65 studies, 5759 cases); and for X-ray 82.1% (95%CI: 62.5 to 92.7 (9 studies, 682 cases). Heterogeneity judged by visual assessment of the ROC plots was considerable. Two studies evaluated the diagnostic accuracy of point-of-care ultrasound and both reported zero false negatives (with 10 and 22 participants having undergone ultrasound, respectively). These studies only reported True Positive and False Negative data, therefore it was not possible to pool and derive estimates of specificity. In studies that included suspected cases, the pooled sensitivity of CT was 86.2% (95%CI: 71.9 to 93.8 (13 studies, 2346 participants) and specificity was 18.1% (95%CI: 3.71 to 55.8). Heterogeneity judged by visual assessment of the forest plots was high. Chest CT may give approximately the same proportion of positive results for patients with and without a SARS-CoV-2 infection: the chances of getting a positive CT result are 86% (95% CI: 72 to 94) in patient with a SARS-CoV-2 infection and 82% (95% CI: 44 to 96) in patients without.
The uncertainty resulting from the poor study quality and the heterogeneity of included studies limit our ability to confidently draw conclusions based on our results. Our findings indicate that chest CT is sensitive but not specific for the diagnosis of COVID-19 in suspected patients, meaning that CT may not be capable of differentiating SARS-CoV-2 infection from other causes of respiratory illness. This low specificity could also be the result of the poor sensitivity of the reference standard (RT-PCR), as CT could potentially be more sensitive than RT-PCR in some cases. Because of limited data, accuracy estimates of chest X-ray and ultrasound of the lungs for the diagnosis of COVID-19 should be carefully interpreted. Future diagnostic accuracy studies should avoid cases-only studies and pre-define positive imaging findings. Planned updates of this review will aim to: increase precision around the accuracy estimates for CT (ideally with low risk of bias studies); obtain further data to inform accuracy of chest X rays and ultrasound; and continue to search for studies that fulfil secondary objectives to inform the utility of imaging along different diagnostic pathways.