Systematic screening in high-burden settings is recommended as a strategy for early detection of pulmonary tuberculosis disease, reducing mortality, morbidity and transmission, and improving equity ...in access to care. Questioning for symptoms and chest radiography (CXR) have historically been the most widely available tools to screen for tuberculosis disease. Their accuracy is important for the design of tuberculosis screening programmes and determines, in combination with the accuracy of confirmatory diagnostic tests, the yield of a screening programme and the burden on individuals and the health service.
To assess the sensitivity and specificity of questioning for the presence of one or more tuberculosis symptoms or symptom combinations, CXR, and combinations of these as screening tools for detecting bacteriologically confirmed pulmonary tuberculosis disease in HIV-negative adults and adults with unknown HIV status who are considered eligible for systematic screening for tuberculosis disease. Second, to investigate sources of heterogeneity, especially in relation to regional, epidemiological, and demographic characteristics of the study populations.
We searched the MEDLINE, Embase, LILACS, and HTA (Health Technology Assessment) databases using pre-specified search terms and consulted experts for unpublished reports, for the period 1992 to 2018. The search date was 10 December 2018. This search was repeated on 2 July 2021.
Studies were eligible if participants were screened for tuberculosis disease using symptom questions, or abnormalities on CXR, or both, and were offered confirmatory testing with a reference standard. We included studies if diagnostic two-by-two tables could be generated for one or more index tests, even if not all participants were subjected to a microbacteriological reference standard. We excluded studies evaluating self-reporting of symptoms.
We categorized symptom and CXR index tests according to commonly used definitions. We assessed the methodological quality of included studies using the QUADAS-2 instrument. We examined the forest plots and receiver operating characteristic plots visually for heterogeneity. We estimated summary sensitivities and specificities (and 95% confidence intervals (CI)) for each index test using bivariate random-effects methods. We analyzed potential sources of heterogeneity in a hierarchical mixed-model.
The electronic database search identified 9473 titles and abstracts. Through expert consultation, we identified 31 reports on national tuberculosis prevalence surveys as eligible (of which eight were already captured in the search of the electronic databases), and we identified 957 potentially relevant articles through reference checking. After removal of duplicates, we assessed 10,415 titles and abstracts, of which we identified 430 (4%) for full text review, whereafter we excluded 364 articles. In total, 66 articles provided data on 59 studies. We assessed the 2 July 2021 search results; seven studies were potentially eligible but would make no material difference to the review findings or grading of the evidence, and were not added in this edition of the review. We judged most studies at high risk of bias in one or more domains, most commonly because of incorporation bias and verification bias. We judged applicability concerns low in more than 80% of studies in all three domains. The three most common symptom index tests, cough for two or more weeks (41 studies), any cough (21 studies), and any tuberculosis symptom (29 studies), showed a summary sensitivity of 42.1% (95% CI 36.6% to 47.7%), 51.3% (95% CI 42.8% to 59.7%), and 70.6% (95% CI 61.7% to 78.2%, all very low-certainty evidence), and a specificity of 94.4% (95% CI 92.6% to 95.8%, high-certainty evidence), 87.6% (95% CI 81.6% to 91.8%, low-certainty evidence), and 65.1% (95% CI 53.3% to 75.4%, low-certainty evidence), respectively. The data on symptom index tests were more heterogenous than those for CXR. The studies on any tuberculosis symptom were the most heterogeneous, but had the lowest number of variables explaining this variation. Symptom index tests also showed regional variation. The summary sensitivity of any CXR abnormality (23 studies) was 94.7% (95% CI 92.2% to 96.4%, very low-certainty evidence) and 84.8% (95% CI 76.7% to 90.4%, low-certainty evidence) for CXR abnormalities suggestive of tuberculosis (19 studies), and specificity was 89.1% (95% CI 85.6% to 91.8%, low-certainty evidence) and 95.6% (95% CI 92.6% to 97.4%, high-certainty evidence), respectively. Sensitivity was more heterogenous than specificity, and could be explained by regional variation. The addition of cough for two or more weeks, whether to any (pulmonary) CXR abnormality or to CXR abnormalities suggestive of tuberculosis, resulted in a summary sensitivity and specificity of 99.2% (95% CI 96.8% to 99.8%) and 84.9% (95% CI 81.2% to 88.1%) (15 studies; certainty of evidence not assessed).
The summary estimates of the symptom and CXR index tests may inform the choice of screening and diagnostic algorithms in any given setting or country where screening for tuberculosis is being implemented. The high sensitivity of CXR index tests, with or without symptom questions in parallel, suggests a high yield of persons with tuberculosis disease. However, additional considerations will determine the design of screening and diagnostic algorithms, such as the availability and accessibility of CXR facilities or the resources to fund them, and the need for more or fewer diagnostic tests to confirm the diagnosis (depending on screening test specificity), which also has resource implications. These review findings should be interpreted with caution due to methodological limitations in the included studies and regional variation in sensitivity and specificity. The sensitivity and specificity of an index test in a specific setting cannot be predicted with great precision due to heterogeneity. This should be borne in mind when planning for and implementing tuberculosis screening programmes.
Background & Aims Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease. At present, there is no appropriate histologic scoring system available for PSC, evaluating both degree ...of necroinflammatory activity (grade) and fibrosis (stage). The aim of this study was to assess if three scoring systems, commonly used in different liver diseases could be applied for grading and/or staging of PSC. Methods Sixty-four PSC patients from a Dutch cohort, who underwent diagnostic liver biopsy, were included. Staging was scored using Ishak, Nakanuma, and Ludwig systems. Grading was scored using Ishak and Nakanuma systems. Three measures of outcome were defined; transplant-free survival, time to liver transplantation (LTx) and occurrence of cirrhosis related symptoms (CRS). Association of grade and stage with outcome was estimated using Kaplan–Meier log-rank test, and Cox regression analysis. Correlation with biochemistry was assessed by Spearman’s rank test. Results There were strong associations between disease stage measured by Ishak, Nakanuma, and Ludwig staging systems with both outcome measuring transplant-free survival (Hazard ratio (HR) 2.56; 95% CI 1.11–5.89, HR 6.53; 95% CI 2.01–21.22, HR 1.94; 95% CI 1.00–3.79, respectively), and time to LTx (HR 4.18; 95%CI 1.51–11.56, HR 7.05; 95% CI 1.77–28.11, HR 3.13; 95%CI 1.42–6.87, respectively). Ishak and Nakanuma grading systems were not associated with CRS. Weak correlations between histopathology and liver biochemistry were shown. Conclusion Applying the Nakanuma, Ishak, and Ludwig histopathological staging systems is feasible and clinically relevant given their association with transplant-free survival and time to LTx. This suggests that these staging systems could be likely candidates for surrogate endpoints and stratification purposes in clinical trials in PSC.
Specific diagnostic tests to detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and resulting COVID-19 disease are not always available and take time to obtain results. Routine ...laboratory markers such as white blood cell count, measures of anticoagulation, C-reactive protein (CRP) and procalcitonin, are used to assess the clinical status of a patient. These laboratory tests may be useful for the triage of people with potential COVID-19 to prioritize them for different levels of treatment, especially in situations where time and resources are limited.
To assess the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19.
On 4 May 2020 we undertook electronic searches in the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern, which is updated daily with published articles from PubMed and Embase and with preprints from medRxiv and bioRxiv. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions.
We included both case-control designs and consecutive series of patients that assessed the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19. The reference standard could be reverse transcriptase polymerase chain reaction (RT-PCR) alone; RT-PCR plus clinical expertise or and imaging; repeated RT-PCR several days apart or from different samples; WHO and other case definitions; and any other reference standard used by the study authors.
Two review authors independently extracted data from each included study. They also assessed the methodological quality of the studies, using QUADAS-2. We used the 'NLMIXED' procedure in SAS 9.4 for the hierarchical summary receiver operating characteristic (HSROC) meta-analyses of tests for which we included four or more studies. To facilitate interpretation of results, for each meta-analysis we estimated summary sensitivity at the points on the SROC curve that corresponded to the median and interquartile range boundaries of specificities in the included studies.
We included 21 studies in this review, including 14,126 COVID-19 patients and 56,585 non-COVID-19 patients in total. Studies evaluated a total of 67 different laboratory tests. Although we were interested in the diagnotic accuracy of routine tests for COVID-19, the included studies used detection of SARS-CoV-2 infection through RT-PCR as reference standard. There was considerable heterogeneity between tests, threshold values and the settings in which they were applied. For some tests a positive result was defined as a decrease compared to normal vaues, for other tests a positive result was defined as an increase, and for some tests both increase and decrease may have indicated test positivity. None of the studies had either low risk of bias on all domains or low concerns for applicability for all domains. Only three of the tests evaluated had a summary sensitivity and specificity over 50%. These were: increase in interleukin-6, increase in C-reactive protein and lymphocyte count decrease. Blood count Eleven studies evaluated a decrease in white blood cell count, with a median specificity of 93% and a summary sensitivity of 25% (95% CI 8.0% to 27%; very low-certainty evidence). The 15 studies that evaluated an increase in white blood cell count had a lower median specificity and a lower corresponding sensitivity. Four studies evaluated a decrease in neutrophil count. Their median specificity was 93%, corresponding to a summary sensitivity of 10% (95% CI 1.0% to 56%; low-certainty evidence). The 11 studies that evaluated an increase in neutrophil count had a lower median specificity and a lower corresponding sensitivity. The summary sensitivity of an increase in neutrophil percentage (4 studies) was 59% (95% CI 1.0% to 100%) at median specificity (38%; very low-certainty evidence). The summary sensitivity of an increase in monocyte count (4 studies) was 13% (95% CI 6.0% to 26%) at median specificity (73%; very low-certainty evidence). The summary sensitivity of a decrease in lymphocyte count (13 studies) was 64% (95% CI 28% to 89%) at median specificity (53%; low-certainty evidence). Four studies that evaluated a decrease in lymphocyte percentage showed a lower median specificity and lower corresponding sensitivity. The summary sensitivity of a decrease in platelets (4 studies) was 19% (95% CI 10% to 32%) at median specificity (88%; low-certainty evidence). Liver function tests The summary sensitivity of an increase in alanine aminotransferase (9 studies) was 12% (95% CI 3% to 34%) at median specificity (92%; low-certainty evidence). The summary sensitivity of an increase in aspartate aminotransferase (7 studies) was 29% (95% CI 17% to 45%) at median specificity (81%) (low-certainty evidence). The summary sensitivity of a decrease in albumin (4 studies) was 21% (95% CI 3% to 67%) at median specificity (66%; low-certainty evidence). The summary sensitivity of an increase in total bilirubin (4 studies) was 12% (95% CI 3.0% to 34%) at median specificity (92%; very low-certainty evidence). Markers of inflammation The summary sensitivity of an increase in CRP (14 studies) was 66% (95% CI 55% to 75%) at median specificity (44%; very low-certainty evidence). The summary sensitivity of an increase in procalcitonin (6 studies) was 3% (95% CI 1% to 19%) at median specificity (86%; very low-certainty evidence). The summary sensitivity of an increase in IL-6 (four studies) was 73% (95% CI 36% to 93%) at median specificity (58%) (very low-certainty evidence). Other biomarkers The summary sensitivity of an increase in creatine kinase (5 studies) was 11% (95% CI 6% to 19%) at median specificity (94%) (low-certainty evidence). The summary sensitivity of an increase in serum creatinine (four studies) was 7% (95% CI 1% to 37%) at median specificity (91%; low-certainty evidence). The summary sensitivity of an increase in lactate dehydrogenase (4 studies) was 25% (95% CI 15% to 38%) at median specificity (72%; very low-certainty evidence).
Although these tests give an indication about the general health status of patients and some tests may be specific indicators for inflammatory processes, none of the tests we investigated are useful for accurately ruling in or ruling out COVID-19 on their own. Studies were done in specific hospitalized populations, and future studies should consider non-hospital settings to evaluate how these tests would perform in people with milder symptoms.
Accurate rapid diagnostic tests for SARS-CoV-2 infection could contribute to clinical and public health strategies to manage the COVID-19 pandemic. Point-of-care antigen and molecular tests to detect ...current infection could increase access to testing and early confirmation of cases, and expediate clinical and public health management decisions that may reduce transmission.
To assess the diagnostic accuracy of point-of-care antigen and molecular-based tests for diagnosis of SARS-CoV-2 infection. We consider accuracy separately in symptomatic and asymptomatic population groups.
Electronic searches of the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern (which includes daily updates from PubMed and Embase and preprints from medRxiv and bioRxiv) were undertaken on 30 Sept 2020. We checked repositories of COVID-19 publications and included independent evaluations from national reference laboratories, the Foundation for Innovative New Diagnostics and the Diagnostics Global Health website to 16 Nov 2020. We did not apply language restrictions.
We included studies of people with either suspected SARS-CoV-2 infection, known SARS-CoV-2 infection or known absence of infection, or those who were being screened for infection. We included test accuracy studies of any design that evaluated commercially produced, rapid antigen or molecular tests suitable for a point-of-care setting (minimal equipment, sample preparation, and biosafety requirements, with results within two hours of sample collection). We included all reference standards that define the presence or absence of SARS-CoV-2 (including reverse transcription polymerase chain reaction (RT-PCR) tests and established diagnostic criteria).
Studies were screened independently in duplicate with disagreements resolved by discussion with a third author. Study characteristics were extracted by one author and checked by a second; extraction of study results and assessments of risk of bias and applicability (made using the QUADAS-2 tool) were undertaken independently in duplicate. We present sensitivity and specificity with 95% confidence intervals (CIs) for each test and pooled data using the bivariate model separately for antigen and molecular-based tests. We tabulated results by test manufacturer and compliance with manufacturer instructions for use and according to symptom status.
Seventy-eight study cohorts were included (described in 64 study reports, including 20 pre-prints), reporting results for 24,087 samples (7,415 with confirmed SARS-CoV-2). Studies were mainly from Europe (n = 39) or North America (n = 20), and evaluated 16 antigen and five molecular assays. We considered risk of bias to be high in 29 (50%) studies because of participant selection; in 66 (85%) because of weaknesses in the reference standard for absence of infection; and in 29 (45%) for participant flow and timing. Studies of antigen tests were of a higher methodological quality compared to studies of molecular tests, particularly regarding the risk of bias for participant selection and the index test. Characteristics of participants in 35 (45%) studies differed from those in whom the test was intended to be used and the delivery of the index test in 39 (50%) studies differed from the way in which the test was intended to be used. Nearly all studies (97%) defined the presence or absence of SARS-CoV-2 based on a single RT-PCR result, and none included participants meeting case definitions for probable COVID-19. Antigen tests Forty-eight studies reported 58 evaluations of antigen tests. Estimates of sensitivity varied considerably between studies. There were differences between symptomatic (72.0%, 95% CI 63.7% to 79.0%; 37 evaluations; 15530 samples, 4410 cases) and asymptomatic participants (58.1%, 95% CI 40.2% to 74.1%; 12 evaluations; 1581 samples, 295 cases). Average sensitivity was higher in the first week after symptom onset (78.3%, 95% CI 71.1% to 84.1%; 26 evaluations; 5769 samples, 2320 cases) than in the second week of symptoms (51.0%, 95% CI 40.8% to 61.0%; 22 evaluations; 935 samples, 692 cases). Sensitivity was high in those with cycle threshold (Ct) values on PCR ≤25 (94.5%, 95% CI 91.0% to 96.7%; 36 evaluations; 2613 cases) compared to those with Ct values >25 (40.7%, 95% CI 31.8% to 50.3%; 36 evaluations; 2632 cases). Sensitivity varied between brands. Using data from instructions for use (IFU) compliant evaluations in symptomatic participants, summary sensitivities ranged from 34.1% (95% CI 29.7% to 38.8%; Coris Bioconcept) to 88.1% (95% CI 84.2% to 91.1%; SD Biosensor STANDARD Q). Average specificities were high in symptomatic and asymptomatic participants, and for most brands (overall summary specificity 99.6%, 95% CI 99.0% to 99.8%). At 5% prevalence using data for the most sensitive assays in symptomatic people (SD Biosensor STANDARD Q and Abbott Panbio), positive predictive values (PPVs) of 84% to 90% mean that between 1 in 10 and 1 in 6 positive results will be a false positive, and between 1 in 4 and 1 in 8 cases will be missed. At 0.5% prevalence applying the same tests in asymptomatic people would result in PPVs of 11% to 28% meaning that between 7 in 10 and 9 in 10 positive results will be false positives, and between 1 in 2 and 1 in 3 cases will be missed. No studies assessed the accuracy of repeated lateral flow testing or self-testing. Rapid molecular assays Thirty studies reported 33 evaluations of five different rapid molecular tests. Sensitivities varied according to test brand. Most of the data relate to the ID NOW and Xpert Xpress assays. Using data from evaluations following the manufacturer's instructions for use, the average sensitivity of ID NOW was 73.0% (95% CI 66.8% to 78.4%) and average specificity 99.7% (95% CI 98.7% to 99.9%; 4 evaluations; 812 samples, 222 cases). For Xpert Xpress, the average sensitivity was 100% (95% CI 88.1% to 100%) and average specificity 97.2% (95% CI 89.4% to 99.3%; 2 evaluations; 100 samples, 29 cases). Insufficient data were available to investigate the effect of symptom status or time after symptom onset.
Antigen tests vary in sensitivity. In people with signs and symptoms of COVID-19, sensitivities are highest in the first week of illness when viral loads are higher. The assays shown to meet appropriate criteria, such as WHO's priority target product profiles for COVID-19 diagnostics ('acceptable' sensitivity ≥ 80% and specificity ≥ 97%), can be considered as a replacement for laboratory-based RT-PCR when immediate decisions about patient care must be made, or where RT-PCR cannot be delivered in a timely manner. Positive predictive values suggest that confirmatory testing of those with positive results may be considered in low prevalence settings. Due to the variable sensitivity of antigen tests, people who test negative may still be infected. Evidence for testing in asymptomatic cohorts was limited. Test accuracy studies cannot adequately assess the ability of antigen tests to differentiate those who are infectious and require isolation from those who pose no risk, as there is no reference standard for infectiousness. A small number of molecular tests showed high accuracy and may be suitable alternatives to RT-PCR. However, further evaluations of the tests in settings as they are intended to be used are required to fully establish performance in practice. Several important studies in asymptomatic individuals have been reported since the close of our search and will be incorporated at the next update of this review. Comparative studies of antigen tests in their intended use settings and according to test operator (including self-testing) are required.
Objective Currently, there is no consensus on the definition of hyperemesis gravidarum (HG; protracted vomiting in pregnancy) and no single widely used set of diagnostic criteria for HG. The various ...definitions rely on symptoms, sometimes in combination with laboratory tests. Through a systematic review, we aimed to summarize available evidence on the diagnostic value of biomarkers for HG. This could assist diagnosis and may shed light on the, as yet, not understood cause of the disorder. Study Design We searched Medline and Embase for articles about diagnostic biomarkers for either the presence or severity of HG or nausea and vomiting of pregnancy. We defined HG as any combination of nausea, vomiting, dehydration, weight loss, or hospitalization for nausea and/or vomiting in pregnancy, in the absence of any other obvious cause for these complaints. Results We found 81 articles on 9 biomarkers. Although 65% of all studies included only HG cases with ketonuria, we did not find an association between ketonuria and presence or severity of HG in 5 studies reporting on this association. Metaanalysis, with the use of the hierarchical summary receiver operating characteristics model, yielded an odds ratio of 3.2 (95% confidence interval, 2.0–5.1) of Heliobacter pylori for HG, as compared with asymptomatic control subjects (sensitivity, 73%; specificity, 55%). Studies on human chorionic gonadotropin and thyroid hormones, leptin, estradiol, progesterone, and white blood count showed inconsistent associations with HG; lymphocytes tended to be higher in women with HG. Conclusion We did not find support for the use of ketonuria in the diagnosis of HG. H pylori serology might be useful in specific patients.
The objective of the study was to clarify how the Grading of Recommendations Assessment, Development and Evaluation (GRADE) concept of certainty of evidence applies to certainty ratings of test ...accuracy.
After initial brainstorming with GRADE Working Group members, we iteratively refined and clarified the approaches for defining ranges when assessing the certainty of evidence for test accuracy within a systematic review, health technology assessment, or guideline.
Ranges can be defined both for single test accuracy and for comparative accuracy of multiple tests. For systematic reviews and health technology assessments, approaches for defining ranges include some that do not require value judgments regarding downstream health outcomes. Key challenges arise in the context of a guideline that requires ranges for sensitivity and specificity that are set considering possible effects on all critical outcomes. We illustrate possible approaches and provide an example from a systematic review of a direct comparison between two test strategies.
This GRADE concept paper provides a framework for assessing, presenting, and making decisions based on the certainty of evidence for test accuracy. More empirical research is needed to support future GRADE guidance on how to best operationalize the candidate approaches.
Comparative diagnostic test accuracy systematic reviews (DTA reviews) assess the accuracy of two or more tests and compare their diagnostic performance. We investigated how comparative DTA reviews ...assessed the risk of bias (RoB) in primary studies that compared multiple index tests.
This is an overview of comparative DTA reviews indexed in MEDLINE from January 1st to December 31st, 2017. Two assessors independently identified DTA reviews including at least two index tests and containing at least one statement in which the accuracy of the index tests was compared. Two assessors independently extracted data on the methods used to assess RoB in studies that directly compared the accuracy of multiple index tests.
We included 238 comparative DTA reviews. Only two reviews (0.8%, 95% confidence interval 0.1 to 3.0%) conducted RoB assessment of test comparisons undertaken in primary studies; neither used an RoB tool specifically designed to assess bias in test comparisons.
Assessment of RoB in test comparisons undertaken in primary studies was uncommon in comparative DTA reviews, possibly due to lack of existing guidance on and awareness of potential sources of bias. Based on our findings, guidance on how to assess and incorporate RoB in comparative DTA reviews is needed.
The respiratory illness caused by SARS-CoV-2 infection continues to present diagnostic challenges. Our 2020 edition of this review showed thoracic (chest) imaging to be sensitive and moderately ...specific in the diagnosis of coronavirus disease 2019 (COVID-19). In this update, we include new relevant studies, and have removed studies with case-control designs, and those not intended to be diagnostic test accuracy studies.
To evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), X-ray and ultrasound) in people with suspected COVID-19.
We searched the COVID-19 Living Evidence Database from the University of Bern, the Cochrane COVID-19 Study Register, The Stephen B. Thacker CDC Library, and repositories of COVID-19 publications through to 30 September 2020. We did not apply any language restrictions.
We included studies of all designs, except for case-control, that recruited participants of any age group suspected to have COVID-19 and that reported estimates of test accuracy or provided data from which we could compute estimates.
The review authors independently and in duplicate screened articles, extracted data and assessed risk of bias and applicability concerns using the QUADAS-2 domain-list. We presented the results of estimated sensitivity and specificity using paired forest plots, and we summarised pooled estimates in tables. We used a bivariate meta-analysis model where appropriate. We presented the uncertainty of accuracy estimates using 95% confidence intervals (CIs).
We included 51 studies with 19,775 participants suspected of having COVID-19, of whom 10,155 (51%) had a final diagnosis of COVID-19. Forty-seven studies evaluated one imaging modality each, and four studies evaluated two imaging modalities each. All studies used RT-PCR as the reference standard for the diagnosis of COVID-19, with 47 studies using only RT-PCR and four studies using a combination of RT-PCR and other criteria (such as clinical signs, imaging tests, positive contacts, and follow-up phone calls) as the reference standard. Studies were conducted in Europe (33), Asia (13), North America (3) and South America (2); including only adults (26), all ages (21), children only (1), adults over 70 years (1), and unclear (2); in inpatients (2), outpatients (32), and setting unclear (17). Risk of bias was high or unclear in thirty-two (63%) studies with respect to participant selection, 40 (78%) studies with respect to reference standard, 30 (59%) studies with respect to index test, and 24 (47%) studies with respect to participant flow. For chest CT (41 studies, 16,133 participants, 8110 (50%) cases), the sensitivity ranged from 56.3% to 100%, and specificity ranged from 25.4% to 97.4%. The pooled sensitivity of chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). There was no statistical evidence indicating that reference standard conduct and definition for index test positivity were sources of heterogeneity for CT studies. Nine chest CT studies (2807 participants, 1139 (41%) cases) used the COVID-19 Reporting and Data System (CO-RADS) scoring system, which has five thresholds to define index test positivity. At a CO-RADS threshold of 5 (7 studies), the sensitivity ranged from 41.5% to 77.9% and the pooled sensitivity was 67.0% (95% CI 56.4 to 76.2); the specificity ranged from 83.5% to 96.2%; and the pooled specificity was 91.3% (95% CI 87.6 to 94.0). At a CO-RADS threshold of 4 (7 studies), the sensitivity ranged from 56.3% to 92.9% and the pooled sensitivity was 83.5% (95% CI 74.4 to 89.7); the specificity ranged from 77.2% to 90.4% and the pooled specificity was 83.6% (95% CI 80.5 to 86.4). For chest X-ray (9 studies, 3694 participants, 2111 (57%) cases) the sensitivity ranged from 51.9% to 94.4% and specificity ranged from 40.4% to 88.9%. The pooled sensitivity of chest X-ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). For ultrasound of the lungs (5 studies, 446 participants, 211 (47%) cases) the sensitivity ranged from 68.2% to 96.8% and specificity ranged from 21.3% to 78.9%. The pooled sensitivity of ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). Based on an indirect comparison using all included studies, chest CT had a higher specificity than ultrasound. For indirect comparisons of chest CT and chest X-ray, or chest X-ray and ultrasound, the data did not show differences in specificity or sensitivity.
Our findings indicate that chest CT is sensitive and moderately specific for the diagnosis of COVID-19. Chest X-ray is moderately sensitive and moderately specific for the diagnosis of COVID-19. Ultrasound is sensitive but not specific for the diagnosis of COVID-19. Thus, chest CT and ultrasound may have more utility for excluding COVID-19 than for differentiating SARS-CoV-2 infection from other causes of respiratory illness. Future diagnostic accuracy studies should pre-define positive imaging findings, include direct comparisons of the various modalities of interest in the same participant population, and implement improved reporting practices.
Abstract Objectives To collect reasons for selecting the methods for meta-analysis of diagnostic accuracy from authors of systematic reviews and improve guidance on recommended methods. Study Design ...and Setting Online survey in authors of recently published meta-analyses of diagnostic accuracy. Results We identified 100 eligible reviews, of which 40 had used more advanced methods of meta-analysis (hierarchical random-effects approach), 52 more traditional methods (summary receiver operating characteristic curve based on linear regression or a univariate approach), and 8 combined both. Fifty-nine authors responded to the survey; 29 (49%) authors had used advanced methods, 25 (42%) authors traditional methods, and 5 (9%) authors combined traditional and advanced methods. Most authors who had used advanced methods reported to do so because they believed that these methods are currently recommended ( n = 27; 93%). Most authors who had used traditional methods also reported to do so because they believed that these methods are currently recommended ( n = 18; 75%) or easy to understand ( n = 18; 75%). Conclusion Although more advanced methods for meta-analysis are recommended by The Cochrane Collaboration, both authors using these methods and those using more traditional methods responded that the methods they used were currently recommended. Clearer and more widespread dissemination of guidelines on recommended methods for meta-analysis of test accuracy data is needed.