Invasive aspergillosis is the most common life-threatening opportunistic invasive mycosis in immunocompromised patients. A test for invasive aspergillosis should neither be too invasive nor too great ...a burden for the already weakened patient. The serum galactomannan enzyme-linked immunosorbent assay (ELISA) seems to have the potential to meet both requirements.
To obtain summary estimates of the diagnostic accuracy of galactomannan detection in serum for the diagnosis of invasive aspergillosis.
We searched MEDLINE, EMBASE and Web of Science with both MeSH terms and text words for both aspergillosis and the sandwich ELISA. We checked the reference lists of included studies and review articles for additional studies. We conducted the searches in February 2014.
We included cross-sectional studies, case-control designs and consecutive series of patients assessing the diagnostic accuracy of galactomannan detection for the diagnosis of invasive aspergillosis in patients with neutropenia or patients whose neutrophils are functionally compromised. The reference standard was composed of the criteria given by the European Organization for Research and Treatment of Cancer (EORTC) and the Mycoses Study Group (MSG).
Two review authors independently assessed quality and extracted data. We carried out meta-analysis using the bivariate method. We investigated sources of heterogeneity by adding potential sources of heterogeneity to the model as covariates.
We included 54 studies in the review (50 in the meta-analyses), containing 5660 patients, of whom 586 had proven or probable invasive aspergillosis. When using an optical density index (ODI) of 0.5 as a cut-off value, the sensitivity of the test was 82% (73% to 90%) and the specificity was 81% (72% to 90%). At a cut-off value of 1.0 ODI, the sensitivity was 72% (65% to 80%) and the specificity was 88% (84% to 92%). At a cut-off value of 1.5 ODI, the sensitivity was 61% (47% to 75%) and the specificity was 93% (89% to 97%). None of the potential sources of heterogeneity had a statistically significant effect on either sensitivity or specificity.
If we used the test at a cut-off value of 0.5 ODI in a population of 100 patients with a disease prevalence of 9% (overall median prevalence), two patients who have invasive aspergillosis would be missed (sensitivity 82%, 18% false negatives), and 17 patients would be treated unnecessarily or referred unnecessarily for further testing (specificity 81%, 19% false negatives). If we used the test at a cut-off value of 1.5 in the same population, that would mean that four invasive aspergillosis patients would be missed (sensitivity 61%, 39% false negatives), and six patients would be treated or referred for further testing unnecessarily (specificity 93%, 7% false negatives). These numbers should, however, be interpreted with caution because the results were very heterogeneous.
Abstract Background Several studies and systematic reviews have reported results that indicate that sensitivity and specificity may vary with prevalence. Study design and setting We identify and ...explore mechanisms that may be responsible for sensitivity and specificity varying with prevalence and illustrate them with examples from the literature. Results Clinical and artefactual variability may be responsible for changes in prevalence and accompanying changes in sensitivity and specificity. Clinical variability refers to differences in the clinical situation that may cause sensitivity and specificity to vary with prevalence. For example, a patient population with a higher disease prevalence may include more severely diseased patients, therefore, the test performs better in this population. Artefactual variability refers to effects on prevalence and accuracy associated with study design, for example, the verification of index test results by a reference standard. Changes in prevalence influence the extent of overestimation due to imperfect reference standard classification. Conclusions Sensitivity and specificity may vary in different clinical populations, and prevalence is a marker for such differences. Clinicians are advised to base their decisions on studies that most closely match their own clinical situation, using prevalence to guide the detection of differences in study population or study design.
Abstract Objectives The Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group developed an approach to assess the quality of evidence of diagnostic tests. Its use in ...Cochrane diagnostic test accuracy reviews is new. We applied this approach to three Cochrane reviews with the aim of better understanding the application of the GRADE criteria to such reviews. Study Design and Setting We selected reviews to achieve clinical and methodological diversities. At least three assessors independently assessed each review according to the GRADE criteria of risk of bias, indirectness, imprecision, inconsistency, and publication bias. Two teleconferences were held to share experiences. Results For the interpretation of the GRADE criteria, it made a difference whether assessors looked at the evidence from a patient-important outcome perspective or from a test accuracy standpoint. GRADE criteria such as inconsistency, imprecision, and publication bias were challenging to apply as was the assessment of comparative test accuracy reviews. Conclusion The perspective from which evidence is graded can influence judgments about quality. Guidance on application of GRADE to comparative test reviews and on the GRADE criteria of inconsistency, imprecision, and publication bias will facilitate the operationalization of GRADE for diagnostics.
Comparative diagnostic test accuracy studies assess and compare the accuracy of 2 or more tests in the same study. Although these studies have the potential to yield reliable evidence regarding ...comparative accuracy, shortcomings in the design, conduct, and analysis may bias their results. The currently recommended quality assessment tool for diagnostic test accuracy studies, QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2), is not designed for the assessment of test comparisons. The QUADAS-C (Quality Assessment of Diagnostic Accuracy Studies-Comparative) tool was developed as an extension of QUADAS-2 to assess the risk of bias in comparative diagnostic test accuracy studies. Through a 4-round Delphi study involving 24 international experts in test evaluation and a face-to-face consensus meeting, an initial version of the tool was developed that was revised and finalized following a pilot study among potential users. The QUADAS-C tool retains the same 4-domain structure of QUADAS-2 (Patient Selection, Index Test, Reference Standard, and Flow and Timing) and comprises additional questions to each QUADAS-2 domain. A risk-of-bias judgment for comparative accuracy requires a risk-of-bias judgment for the accuracy of each test (resulting from QUADAS-2) and additional criteria specific to test comparisons. Examples of such additional criteria include whether participants either received all index tests or were randomly assigned to index tests, and whether index tests were interpreted with blinding to the results of other index tests. The QUADAS-C tool will be useful for systematic reviews of diagnostic test accuracy addressing comparative questions. Furthermore, researchers may use this tool to identify and avoid risk of bias when designing a comparative diagnostic test accuracy study.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and resulting COVID-19 pandemic present important diagnostic challenges. Several diagnostic strategies are available to identify ...current infection, rule out infection, identify people in need of care escalation, or to test for past infection and immune response. Serology tests to detect the presence of antibodies to SARS-CoV-2 aim to identify previous SARS-CoV-2 infection, and may help to confirm the presence of current infection.
To assess the diagnostic accuracy of antibody tests to determine if a person presenting in the community or in primary or secondary care has SARS-CoV-2 infection, or has previously had SARS-CoV-2 infection, and the accuracy of antibody tests for use in seroprevalence surveys.
We undertook electronic searches in the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern, which is updated daily with published articles from PubMed and Embase and with preprints from medRxiv and bioRxiv. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions. We conducted searches for this review iteration up to 27 April 2020.
We included test accuracy studies of any design that evaluated antibody tests (including enzyme-linked immunosorbent assays, chemiluminescence immunoassays, and lateral flow assays) in people suspected of current or previous SARS-CoV-2 infection, or where tests were used to screen for infection. We also included studies of people either known to have, or not to have SARS-CoV-2 infection. We included all reference standards to define the presence or absence of SARS-CoV-2 (including reverse transcription polymerase chain reaction tests (RT-PCR) and clinical diagnostic criteria).
We assessed possible bias and applicability of the studies using the QUADAS-2 tool. We extracted 2x2 contingency table data and present sensitivity and specificity for each antibody (or combination of antibodies) using paired forest plots. We pooled data using random-effects logistic regression where appropriate, stratifying by time since post-symptom onset. We tabulated available data by test manufacturer. We have presented uncertainty in estimates of sensitivity and specificity using 95% confidence intervals (CIs).
We included 57 publications reporting on a total of 54 study cohorts with 15,976 samples, of which 8526 were from cases of SARS-CoV-2 infection. Studies were conducted in Asia (n = 38), Europe (n = 15), and the USA and China (n = 1). We identified data from 25 commercial tests and numerous in-house assays, a small fraction of the 279 antibody assays listed by the Foundation for Innovative Diagnostics. More than half (n = 28) of the studies included were only available as preprints. We had concerns about risk of bias and applicability. Common issues were use of multi-group designs (n = 29), inclusion of only COVID-19 cases (n = 19), lack of blinding of the index test (n = 49) and reference standard (n = 29), differential verification (n = 22), and the lack of clarity about participant numbers, characteristics and study exclusions (n = 47). Most studies (n = 44) only included people hospitalised due to suspected or confirmed COVID-19 infection. There were no studies exclusively in asymptomatic participants. Two-thirds of the studies (n = 33) defined COVID-19 cases based on RT-PCR results alone, ignoring the potential for false-negative RT-PCR results. We observed evidence of selective publication of study findings through omission of the identity of tests (n = 5). We observed substantial heterogeneity in sensitivities of IgA, IgM and IgG antibodies, or combinations thereof, for results aggregated across different time periods post-symptom onset (range 0% to 100% for all target antibodies). We thus based the main results of the review on the 38 studies that stratified results by time since symptom onset. The numbers of individuals contributing data within each study each week are small and are usually not based on tracking the same groups of patients over time. Pooled results for IgG, IgM, IgA, total antibodies and IgG/IgM all showed low sensitivity during the first week since onset of symptoms (all less than 30.1%), rising in the second week and reaching their highest values in the third week. The combination of IgG/IgM had a sensitivity of 30.1% (95% CI 21.4 to 40.7) for 1 to 7 days, 72.2% (95% CI 63.5 to 79.5) for 8 to 14 days, 91.4% (95% CI 87.0 to 94.4) for 15 to 21 days. Estimates of accuracy beyond three weeks are based on smaller sample sizes and fewer studies. For 21 to 35 days, pooled sensitivities for IgG/IgM were 96.0% (95% CI 90.6 to 98.3). There are insufficient studies to estimate sensitivity of tests beyond 35 days post-symptom onset. Summary specificities (provided in 35 studies) exceeded 98% for all target antibodies with confidence intervals no more than 2 percentage points wide. False-positive results were more common where COVID-19 had been suspected and ruled out, but numbers were small and the difference was within the range expected by chance. Assuming a prevalence of 50%, a value considered possible in healthcare workers who have suffered respiratory symptoms, we would anticipate that 43 (28 to 65) would be missed and 7 (3 to 14) would be falsely positive in 1000 people undergoing IgG/IgM testing at days 15 to 21 post-symptom onset. At a prevalence of 20%, a likely value in surveys in high-risk settings, 17 (11 to 26) would be missed per 1000 people tested and 10 (5 to 22) would be falsely positive. At a lower prevalence of 5%, a likely value in national surveys, 4 (3 to 7) would be missed per 1000 tested, and 12 (6 to 27) would be falsely positive. Analyses showed small differences in sensitivity between assay type, but methodological concerns and sparse data prevent comparisons between test brands.
The sensitivity of antibody tests is too low in the first week since symptom onset to have a primary role for the diagnosis of COVID-19, but they may still have a role complementing other testing in individuals presenting later, when RT-PCR tests are negative, or are not done. Antibody tests are likely to have a useful role for detecting previous SARS-CoV-2 infection if used 15 or more days after the onset of symptoms. However, the duration of antibody rises is currently unknown, and we found very little data beyond 35 days post-symptom onset. We are therefore uncertain about the utility of these tests for seroprevalence surveys for public health management purposes. Concerns about high risk of bias and applicability make it likely that the accuracy of tests when used in clinical care will be lower than reported in the included studies. Sensitivity has mainly been evaluated in hospitalised patients, so it is unclear whether the tests are able to detect lower antibody levels likely seen with milder and asymptomatic COVID-19 disease. The design, execution and reporting of studies of the accuracy of COVID-19 tests requires considerable improvement. Studies must report data on sensitivity disaggregated by time since onset of symptoms. COVID-19-positive cases who are RT-PCR-negative should be included as well as those confirmed RT-PCR, in accordance with the World Health Organization (WHO) and China National Health Commission of the People's Republic of China (CDC) case definitions. We were only able to obtain data from a small proportion of available tests, and action is needed to ensure that all results of test evaluations are available in the public domain to prevent selective reporting. This is a fast-moving field and we plan ongoing updates of this living systematic review.
Accurate rapid diagnostic tests for SARS-CoV-2 infection would be a useful tool to help manage the COVID-19 pandemic. Testing strategies that use rapid antigen tests to detect current infection have ...the potential to increase access to testing, speed detection of infection, and inform clinical and public health management decisions to reduce transmission. This is the second update of this review, which was first published in 2020.
To assess the diagnostic accuracy of rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection. We consider accuracy separately in symptomatic and asymptomatic population groups. Sources of heterogeneity investigated included setting and indication for testing, assay format, sample site, viral load, age, timing of test, and study design.
We searched the COVID-19 Open Access Project living evidence database from the University of Bern (which includes daily updates from PubMed and Embase and preprints from medRxiv and bioRxiv) on 08 March 2021. We included independent evaluations from national reference laboratories, FIND and the Diagnostics Global Health website. We did not apply language restrictions.
We included studies of people with either suspected SARS-CoV-2 infection, known SARS-CoV-2 infection or known absence of infection, or those who were being screened for infection. We included test accuracy studies of any design that evaluated commercially produced, rapid antigen tests. We included evaluations of single applications of a test (one test result reported per person) and evaluations of serial testing (repeated antigen testing over time). Reference standards for presence or absence of infection were any laboratory-based molecular test (primarily reverse transcription polymerase chain reaction (RT-PCR)) or pre-pandemic respiratory sample.
We used standard screening procedures with three people. Two people independently carried out quality assessment (using the QUADAS-2 tool) and extracted study results. Other study characteristics were extracted by one review author and checked by a second. We present sensitivity and specificity with 95% confidence intervals (CIs) for each test, and pooled data using the bivariate model. We investigated heterogeneity by including indicator variables in the random-effects logistic regression models. We tabulated results by test manufacturer and compliance with manufacturer instructions for use and according to symptom status.
We included 155 study cohorts (described in 166 study reports, with 24 as preprints). The main results relate to 152 evaluations of single test applications including 100,462 unique samples (16,822 with confirmed SARS-CoV-2). Studies were mainly conducted in Europe (101/152, 66%), and evaluated 49 different commercial antigen assays. Only 23 studies compared two or more brands of test. Risk of bias was high because of participant selection (40, 26%); interpretation of the index test (6, 4%); weaknesses in the reference standard for absence of infection (119, 78%); and participant flow and timing 41 (27%). Characteristics of participants (45, 30%) and index test delivery (47, 31%) differed from the way in which and in whom the test was intended to be used. Nearly all studies (91%) used a single RT-PCR result to define presence or absence of infection. The 152 studies of single test applications reported 228 evaluations of antigen tests. Estimates of sensitivity varied considerably between studies, with consistently high specificities. Average sensitivity was higher in symptomatic (73.0%, 95% CI 69.3% to 76.4%; 109 evaluations; 50,574 samples, 11,662 cases) compared to asymptomatic participants (54.7%, 95% CI 47.7% to 61.6%; 50 evaluations; 40,956 samples, 2641 cases). Average sensitivity was higher in the first week after symptom onset (80.9%, 95% CI 76.9% to 84.4%; 30 evaluations, 2408 cases) than in the second week of symptoms (53.8%, 95% CI 48.0% to 59.6%; 40 evaluations, 1119 cases). For those who were asymptomatic at the time of testing, sensitivity was higher when an epidemiological exposure to SARS-CoV-2 was suspected (64.3%, 95% CI 54.6% to 73.0%; 16 evaluations; 7677 samples, 703 cases) compared to where COVID-19 testing was reported to be widely available to anyone on presentation for testing (49.6%, 95% CI 42.1% to 57.1%; 26 evaluations; 31,904 samples, 1758 cases). Average specificity was similarly high for symptomatic (99.1%) or asymptomatic (99.7%) participants. We observed a steady decline in summary sensitivities as measures of sample viral load decreased. Sensitivity varied between brands. When tests were used according to manufacturer instructions, average sensitivities by brand ranged from 34.3% to 91.3% in symptomatic participants (20 assays with eligible data) and from 28.6% to 77.8% for asymptomatic participants (12 assays). For symptomatic participants, summary sensitivities for seven assays were 80% or more (meeting acceptable criteria set by the World Health Organization (WHO)). The WHO acceptable performance criterion of 97% specificity was met by 17 of 20 assays when tests were used according to manufacturer instructions, 12 of which demonstrated specificities above 99%. For asymptomatic participants the sensitivities of only two assays approached but did not meet WHO acceptable performance standards in one study each; specificities for asymptomatic participants were in a similar range to those observed for symptomatic people. At 5% prevalence using summary data in symptomatic people during the first week after symptom onset, the positive predictive value (PPV) of 89% means that 1 in 10 positive results will be a false positive, and around 1 in 5 cases will be missed. At 0.5% prevalence using summary data for asymptomatic people, where testing was widely available and where epidemiological exposure to COVID-19 was suspected, resulting PPVs would be 38% to 52%, meaning that between 2 in 5 and 1 in 2 positive results will be false positives, and between 1 in 2 and 1 in 3 cases will be missed.
Antigen tests vary in sensitivity. In people with signs and symptoms of COVID-19, sensitivities are highest in the first week of illness when viral loads are higher. Assays that meet appropriate performance standards, such as those set by WHO, could replace laboratory-based RT-PCR when immediate decisions about patient care must be made, or where RT-PCR cannot be delivered in a timely manner. However, they are more suitable for use as triage to RT-PCR testing. The variable sensitivity of antigen tests means that people who test negative may still be infected. Many commercially available rapid antigen tests have not been evaluated in independent validation studies. Evidence for testing in asymptomatic cohorts has increased, however sensitivity is lower and there is a paucity of evidence for testing in different settings. Questions remain about the use of antigen test-based repeat testing strategies. Further research is needed to evaluate the effectiveness of screening programmes at reducing transmission of infection, whether mass screening or targeted approaches including schools, healthcare setting and traveller screening.
Point-of-care (POC) tests for diagnosing schistosomiasis include tests based on circulating antigen detection and urine reagent strip tests. If they had sufficient diagnostic accuracy they could ...replace conventional microscopy as they provide a quicker answer and are easier to use.
To summarise the diagnostic accuracy of: a) urine reagent strip tests in detecting active Schistosoma haematobium infection, with microscopy as the reference standard; and b) circulating antigen tests for detecting active Schistosoma infection in geographical regions endemic for Schistosoma mansoni or S. haematobium or both, with microscopy as the reference standard.
We searched the electronic databases MEDLINE, EMBASE, BIOSIS, MEDION, and Health Technology Assessment (HTA) without language restriction up to 30 June 2014.
We included studies that used microscopy as the reference standard: for S. haematobium, microscopy of urine prepared by filtration, centrifugation, or sedimentation methods; and for S. mansoni, microscopy of stool by Kato-Katz thick smear. We included studies on participants residing in endemic areas only.
Two review authors independently extracted data, assessed quality of the data using QUADAS-2, and performed meta-analysis where appropriate. Using the variability of test thresholds, we used the hierarchical summary receiver operating characteristic (HSROC) model for all eligible tests (except the circulating cathodic antigen (CCA) POC for S. mansoni, where the bivariate random-effects model was more appropriate). We investigated heterogeneity, and carried out indirect comparisons where data were sufficient. Results for sensitivity and specificity are presented as percentages with 95% confidence intervals (CI).
We included 90 studies; 88 from field settings in Africa. The median S. haematobium infection prevalence was 41% (range 1% to 89%) and 36% for S. mansoni (range 8% to 95%). Study design and conduct were poorly reported against current standards. Tests for S. haematobium Urine reagent test strips versus microscopyCompared to microscopy, the detection of microhaematuria on test strips had the highest sensitivity and specificity (sensitivity 75%, 95% CI 71% to 79%; specificity 87%, 95% CI 84% to 90%; 74 studies, 102,447 participants). For proteinuria, sensitivity was 61% and specificity was 82% (82,113 participants); and for leukocyturia, sensitivity was 58% and specificity 61% (1532 participants). However, the difference in overall test accuracy between the urine reagent strips for microhaematuria and proteinuria was not found to be different when we compared separate populations (P = 0.25), or when direct comparisons within the same individuals were performed (paired studies; P = 0.21).When tests were evaluated against the higher quality reference standard (when multiple samples were analysed), sensitivity was marginally lower for microhaematuria (71% vs 75%) and for proteinuria (49% vs 61%). The specificity of these tests was comparable. Antigen assayCompared to microscopy, the CCA test showed considerable heterogeneity; meta-analytic sensitivity estimate was 39%, 95% CI 6% to 73%; specificity 78%, 95% CI 55% to 100% (four studies, 901 participants). Tests for S. mansoni Compared to microscopy, the CCA test meta-analytic estimates for detecting S. mansoni at a single threshold of trace positive were: sensitivity 89% (95% CI 86% to 92%); and specificity 55% (95% CI 46% to 65%; 15 studies, 6091 participants) Against a higher quality reference standard, the sensitivity results were comparable (89% vs 88%) but specificity was higher (66% vs 55%). For the CAA test, sensitivity ranged from 47% to 94%, and specificity from 8% to 100% (4 studies, 1583 participants).
Among the evaluated tests for S. haematobium infection, microhaematuria correctly detected the largest proportions of infections and non-infections identified by microscopy.The CCA POC test for S. mansoni detects a very large proportion of infections identified by microscopy, but it misclassifies a large proportion of microscopy negatives as positives in endemic areas with a moderate to high prevalence of infection, possibly because the test is potentially more sensitive than microscopy.
Anecdotal evidence suggests that the sensitivity and specificity of a diagnostic test may vary with disease prevalence. Our objective was to investigate the associations between disease prevalence ...and test sensitivity and specificity using studies of diagnostic accuracy.
We used data from 23 meta-analyses, each of which included 10-39 studies (416 total). The median prevalence per review ranged from 1% to 77%. We evaluated the effects of prevalence on sensitivity and specificity using a bivariate random-effects model for each meta-analysis, with prevalence as a covariate. We estimated the overall effect of prevalence by pooling the effects using the inverse variance method.
Within a given review, a change in prevalence from the lowest to highest value resulted in a corresponding change in sensitivity or specificity from 0 to 40 percentage points. This effect was statistically significant (p < 0.05) for either sensitivity or specificity in 8 meta-analyses (35%). Overall, specificity tended to be lower with higher disease prevalence; there was no such systematic effect for sensitivity.
The sensitivity and specificity of a test often vary with disease prevalence; this effect is likely to be the result of mechanisms, such as patient spectrum, that affect prevalence, sensitivity and specificity. Because it may be difficult to identify such mechanisms, clinicians should use prevalence as a guide when selecting studies that most closely match their situation.
Cochrane diagnostic test accuracy reviews Leeflang, Mariska M G; Deeks, Jonathan J; Takwoingi, Yemisi ...
Systematic reviews,
10/2013, Volume:
2, Issue:
1
Journal Article
Peer reviewed
Open access
In 1996, shortly after the founding of The Cochrane Collaboration, leading figures in test evaluation research established a Methods Group to focus on the relatively new and rapidly evolving methods ...for the systematic review of studies of diagnostic tests. Seven years later, the Collaboration decided it was time to develop a publication format and methodology for Diagnostic Test Accuracy (DTA) reviews, as well as the software needed to implement these reviews in The Cochrane Library. A meeting hosted by the German Cochrane Centre in 2004 brought together key methodologists in the area, many of whom became closely involved in the subsequent development of the methodological framework for DTA reviews. DTA reviews first appeared in The Cochrane Library in 2008 and are now an integral part of the work of the Collaboration.
Summary Background Novel endoscopic technologies could allow optical diagnosis and resection of colonic polyps without histopathological testing. Our aim was to establish the sensitivity, ...specificity, and real-time negative predictive value of three types of narrowed spectrum endoscopy (narrow-band imaging NBI, image-enhanced endoscopy i-scan, and Fujinon intelligent chromoendoscopy FICE), confocal laser endomicroscopy (CLE), and autofluorescence imaging for differentiation between neoplastic and non-neoplastic colonic lesions. Methods We identified relevant studies through a search of Medline, Embase, PubMed, and the Cochrane Library. Clinical trials and observational studies were eligible for inclusion when the diagnostic performance of NBI, i-scan, FICE, autofluorescence imaging, or CLE had been assessed for differentiation, with histopathology as the reference standard, and for which a 2 × 2 contingency table of lesion diagnosis could be constructed. We did a random-effects bivariate meta-analysis using a non-linear mixed model approach to calculate summary estimates of sensitivity and specificity, and plotted estimates in a summary receiver-operating characteristic curve. Findings We included 91 studies in our analysis: 56 were of NBI, ten of i-scan, 14 of FICE, 11 of CLE, and 11 of autofluorescence imaging (more than one of the investigated modalities assessed in eight studies). For NBI, overall sensitivity was 91·0% (95% CI 88·6–93·0), specificity 85·6% (81·3–89·0), and real-time negative predictive value 82·5% (75·4–87·9). For i-scan, overall sensitivity was 89·3% (83·3–93·3), specificity 88·2% (80·3–93·2), and real-time negative predictive value 86·5% (78·0–92·1). For FICE, overall sensitivity was 91·8% (87·1–94·9), specificity 83·5% (77·2–88·3), and real-time negative predictive value 83·7% (77·5–88·4). For autofluorescence imaging, overall sensitivity was 86·7% (79·5–91·6), specificity 65·9% (50·9–78·2), and real-time negative predictive value 81·5% (54·0–94·3). For CLE, overall sensitivity was 93·3% (88·4–96·2), specificity 89·9% (81·8–94·6), and real-time negative predictive value 94·8% (86·6–98·1). Interpretation All endoscopic imaging techniques other than autofluorescence imaging could be used by appropriately trained endoscopists to make a reliable optical diagnosis for colonic lesions in daily practice. Further research should be focused on whether training could help to improve negative predictive values. Funding None.