Rapid diagnosis of respiratory virus infections contributes to patient care. This systematic review evaluates the diagnostic accuracy of rapid tests for the detection of respiratory viruses. We ...searched Medline and EMBASE for studies evaluating these tests against polymerase chain reaction as the reference standard. Of 179 studies included, 134 evaluated rapid tests for influenza viruses, 32 for respiratory syncytial virus (RSV), and 13 for other respiratory viruses. We used the bivariate random effects model for quantitative meta-analysis of the results. Most tests detected only influenza viruses or RSV. Summary sensitivity and specificity estimates of tests for influenza were 61.1% and 98.9%. For RSV, summary sensitivity was 75.3%, and specificity, 98.7%. We assessed the quality of studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist. Because of incomplete reporting, the risk of bias was often unclear. Despite their intended use at the point of care, 26.3% of tests were evaluated in a laboratory setting. Although newly developed tests seem more sensitive, high-quality evaluations of these tests are lacking.
Background Clinical and laboratory diagnosis of cutaneous leishmaniasis (CL) is hampered by under-ascertainment of direct microscopy. Methods This study compared the diagnostic accuracy of qPCR on ...DNA extracted from filter paper to the accuracy of direct smear slide microscopy in participants presenting with a cutaneous lesion suspected of leishmaniasis to 16 rural healthcare centers in the Ecuadorian Amazon and Pacific regions, from January 2019 to June 2021. We used Bayesian latent class analysis to estimate test sensitivity, specificity, likelihood ratios (LR), and predictive values (PV) with their 95% credible intervals (95%CrI). The impact of sociodemographic and clinical characteristics on predictive values was assessed as a secondary objective. Results Of 320 initially included participants, paired valid test results were available and included in the diagnostic accuracy analysis for 129 from the Amazon and 185 from the Pacific region. We estimated sensitivity of 68% (95%CrI 49% to 82%) and 73% (95%CrI 73% to 83%) for qPCR, and 51% (95%CrI 36% to 66%) and 76% (95%CrI 65% to 86%) for microscopy in the Amazon and Pacific region, respectively. In the Amazon, with an estimated disease prevalence among participants of 73%, negative PV for qPCR was 54% (95%CrI 5% to 77%) and 44% (95%CrI 4% to 65%) for microscopy. In the Pacific, (prevalence 88%) the negative PV was 34% (95%CrI 3% to 58%) and 37% (95%CrI 3% to 63%). The addition of qPCR parallel to microscopy in the Amazon increases the observed prevalence from 38% to 64% (+26 (95%CrI 19 to 34) percentage points). Conclusion The accuracy of either qPCR on DNA extracted from filter paper or microscopy for CL diagnosis as a stand-alone test seems to be unsatisfactory and region-dependent. We recommend further studies to confirm the clinically relevant increment found in the diagnostic yield due to the addition of qPCR.
Laboratory animal studies are used in a wide range of human health related research areas, such as basic biomedical research, drug research, experimental surgery and environmental health. The results ...of these studies can be used to inform decisions regarding clinical research in humans, for example the decision to proceed to clinical trials. If the research question relates to potential harms with no expectation of benefit (e.g., toxicology), studies in experimental animals may provide the only relevant or controlled data and directly inform clinical management decisions. Systematic reviews and meta-analyses are important tools to provide robust and informative evidence summaries of these animal studies. Rating how certain we are about the evidence could provide important information about the translational probability of findings in experimental animal studies to clinical practice and probably improve it. Evidence summaries and certainty in the evidence ratings could also be used (1) to support selection of interventions with best therapeutic potential to be tested in clinical trials, (2) to justify a regulatory decision limiting human exposure (to drug or toxin), or to (3) support decisions on the utility of further animal experiments. The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach is the most widely used framework to rate the certainty in the evidence and strength of health care recommendations. Here we present how the GRADE approach could be used to rate the certainty in the evidence of preclinical animal studies in the context of therapeutic interventions. We also discuss the methodological challenges that we identified, and for which further work is needed. Examples are defining the importance of consistency within and across animal species and using GRADE's indirectness domain as a tool to predict translation from animal models to humans.
Diagnosis of leptospirosis by the microscopic agglutination test (MAT) or by culture is confined to specialized laboratories. Although ELISA techniques are more common, they still require laboratory ...facilities. Rapid Diagnostic Tests (RDTs) can be used for easy point-of-care diagnosis. This study aims to evaluate the diagnostic performance of the RDTs LeptoTek Dri Dot, LeptoTek Lateral Flow, and Leptocheck-WB, prospectively.
During 2001 to 2012, one or two of the RDTs at the same time have been applied prior to routine diagnostics (MAT, ELISA and culture) on serum specimens from participants sent in for leptospirosis diagnosis. The case definition was based on MAT, ELISA and culture results. Participants not fulfilling the case definition were considered not to have leptospirosis. The diagnostic accuracy was determined based on the 1(st) submitted sample and paired samples, either in an overall analysis or stratified according to days post onset of illness.
The overall sensitivity and specificity for the LeptoTek Dri Dot was 75% respectively 96%, for the LeptoTek Lateral Flow 78% respectively 95%, and for the Leptocheck-WB 78% respectively 98%. Based on the 1(st) submitted sample the sensitivity was low (51% for LeptoTek Dri Dot, 69% for LeptoTek Lateral Flow, and 55% for Leptocheck-WB), but substantially increased when the results of paired samples were combined, although accompanied by a lower specificity (82% respectively 91% for LeptoTek Dri Dot, 86% respectively 84% for LeptoTek Lateral Flow, and 80% respectively 93% for Leptocheck-WB).
All three tests present antibody tests contributing to the diagnosis of leptospirosis, thus supporting clinical suspicion and contributing to awareness. Since the overall sensitivity of the tested RDTs did not exceed 80%, one should be cautious to rely only on an RDT result, and confirmation by reference tests is strongly recommended.
Objective To identify and validate PubMed search filters for retrieving studies including children and to develop a new pediatric search filter for PubMed. Study design We developed 2 different ...datasets of studies to evaluate the performance of the identified pediatric search filters, expressed in terms of sensitivity, precision, specificity, accuracy, and number needed to read (NNR). An optimal search filter will have a high sensitivity and high precision with a low NNR. Results In addition to the PubMed Limits: All Child: 0-18 years filter (in May 2012 renamed to PubMed Filter Child: 0-18 years), 6 search filters for identifying studies including children were identified: 3 developed by Kastner et al, 1 developed by BestBets, one by the Child Health Field, and 1 by the Cochrane Childhood Cancer Group. Three search filters (Cochrane Childhood Cancer Group, Child Health Field, and BestBets) had the highest sensitivity (99.3%, 99.5%, and 99.3%, respectively) but a lower precision (64.5%, 68.4%, and 66.6% respectively) compared with the other search filters. Two Kastner search filters had a high precision (93.0% and 93.7%, respectively) but a low sensitivity (58.5% and 44.8%, respectively). They failed to identify many pediatric studies in our datasets. The search terms responsible for false-positive results in the reference dataset were determined. With these data, we developed a new search filter for identifying studies with children in PubMed with an optimal sensitivity (99.5%) and precision (69.0%). Conclusion Search filters to identify studies including children either have a low sensitivity or a low precision with a high NNR. A new pediatric search filter with a high sensitivity and a low NNR has been developed.
Embase is a biomedical and pharmacological bibliographic database of published literature, produced by Elsevier. In 2011, Embase introduced the Emtree term “diagnostic test accuracy study,” after ...discussion with the diagnostic test accuracy (DTA) community of Cochrane. The aim of this study is to investigate the performance of this Emtree term when used to retrieve diagnostic accuracy studies.
We first piloted a random selection of 1,000 titles from Embase and then repeated the process with 1,223 studies specifically limited to humans. Two researchers independently screened those for eligibility. From titles that were indicated as being relevant or potentially relevant by at least one assessor, the full texts were retrieved and screened. A third researcher retrieved the Emtree terms for each title and checked whether “diagnostic test accuracy study” was one of the attached Emtree terms. The results of both exercises were then cross-classified, and sensitivity and specificity of the Emtree term were estimated.
Our pilot set consisted of 1,000 studies, of which 20 (2.0%) were studies from which DTA data could be extracted. Thirteen studies had the label DTA study, of which five were indeed DTA studies. The final set consisted of 1,223 studies, of which 33 (2.7%) were DTA studies. Twenty studies were labeled as DTA study, of which fourteen indeed were DTA studies. This resulted in a sensitivity of 42.4% (95% CI: 25.5% to 60.8%) and a specificity of 99.5% (95% CI: 98.9% to 99.8%).
Although we planned to include a more focused set of studies in our second attempt, the percentage of DTA studies was similar in both attempts. The DTA label failed to retrieve most of the DTA studies and 30% of the studies labeled as being DTA study were in fact not DTA studies. The Emtree term DTA study does not meet the requirements to be useful for retrieving DTA studies accurately.
Abstract Objective Meta-analysis of predictive values is usually discouraged because these values are directly affected by disease prevalence, but sensitivity and specificity sometimes show ...substantial heterogeneity as well. We propose a bivariate random-effects logitnormal model for the meta-analysis of the positive predictive value (PPV) and negative predictive value (NPV) of diagnostic tests. Study Design and Setting Twenty-three meta-analyses of diagnostic accuracy were reanalyzed. With separate models, we calculated summary estimates of the PPV and NPV and summary estimates of sensitivity and specificity. We compared these summary estimates, the goodness of fit of the two models, and the amount of heterogeneity of both approaches. Results There were no substantial differences in the goodness of fit or amount of heterogeneity between both models. The median absolute difference between the projected PPV and NPV from the summary estimates of sensitivity and specificity and the summary estimates of PPV and NPV was 1% point (interquartile range, 0–2% points). Conclusion A model for the meta-analysis of predictive values fitted the data from a range of systematic reviews equally well as meta-analysis of sensitivity and specificity. The choice for either model could be guided by considerations of the design used in the primary studies and sources of heterogeneity.
We wished to assess the frequency of overinterpretation in systematic reviews of diagnostic accuracy studies.
MEDLINE was searched through PubMed from December 2015 to January 2016. Systematic ...reviews of diagnostic accuracy studies in English were included if they reported one or more metaanalyses of accuracy estimates. We built and piloted a list of 10 items that represent actual overinterpretation in the abstract and/or full-text conclusion, and a list of 9 items that represent potential overinterpretation. Two investigators independently used the items to score each included systematic review, with disagreements resolved by consensus.
We included 112 systematic reviews. The majority had a positive conclusion regarding the accuracy or clinical usefulness of the investigated test in the abstract (n = 83; 74%) and full-text (n = 83; 74%). Of the 112 reviews, 81 (72%) contained at least 1 actual form of overinterpretation in the abstract, and 77 (69%) in the full-text. This was most often a "positive conclusion, not reflecting the reported summary accuracy estimates," in 55 (49%) abstracts and 56 (50%) full-texts and a "positive conclusion, not taking high risk of bias and/or applicability concerns into account," in 47 abstracts (42%) and 26 full-texts (23%). Of these 112 reviews, 107 (96%) contained a form of potential overinterpretation, most frequently "nonrecommended statistical methods for metaanalysis performed" (n = 57; 51%).
Most recent systematic reviews of diagnostic accuracy studies present positive conclusions and a majority contain a form of overinterpretation. This may lead to unjustified optimism about test performance and erroneous clinical decisions and recommendations.
To provide guidance on rating imprecision in a body of evidence assessing the accuracy of a single test. This guide will clarify when Grading of Recommendations Assessment, Development and Evaluation ...(GRADE) users should consider rating down the certainty of evidence by one or more levels for imprecision in test accuracy.
A project group within the GRADE working group conducted iterative discussions and presentations at GRADE working group meetings to produce this guidance.
Before rating the certainty of evidence, GRADE users should define the target of their certainty rating. GRADE recommends setting judgment thresholds defining what they consider a very accurate, accurate, inaccurate, and very inaccurate test. These thresholds should be set after considering consequences of testing and effects on people-important outcomes. GRADE's primary criterion for judging imprecision in test accuracy evidence is considering confidence intervals (i.e., CI approach) of absolute test accuracy results (true and false, positive, and negative results in a cohort of people). Based on the CI approach, when a CI appreciably crosses the predefined judgment threshold(s), one should consider rating down certainty of evidence by one or more levels, depending on the number of thresholds crossed. When the CI does not cross judgment threshold(s), GRADE suggests considering the sample size for an adequately powered test accuracy review (optimal or review information size optimal information size (OIS)/review information size (RIS)) in rating imprecision. If the combined sample size of the included studies in the review is smaller than the required OIS/RIS, one should consider rating down by one or more levels for imprecision.
This paper extends previous GRADE guidance for rating imprecision in single test accuracy systematic reviews and guidelines, with a focus on the circumstances in which one should consider rating down one or more levels for imprecision.