The evaluation of the credibility of results from a meta-analysis has become an important part of the evidence synthesis process. We present a methodological framework to evaluate confidence in the ...results from network meta-analyses, Confidence in Network Meta-Analysis (CINeMA), when multiple interventions are compared.
CINeMA considers 6 domains: (i) within-study bias, (ii) reporting bias, (iii) indirectness, (iv) imprecision, (v) heterogeneity, and (vi) incoherence. Key to judgments about within-study bias and indirectness is the percentage contribution matrix, which shows how much information each study contributes to the results from network meta-analysis. The contribution matrix can easily be computed using a freely available web application. In evaluating imprecision, heterogeneity, and incoherence, we consider the impact of these components of variability in forming clinical decisions.
Via 3 examples, we show that CINeMA improves transparency and avoids the selective use of evidence when forming judgments, thus limiting subjectivity in the process. CINeMA is easy to apply even in large and complicated networks.
Abstract
Aims
Owing to new evidence from randomized controlled trials (RCTs) in low-risk patients with severe aortic stenosis, we compared the collective safety and efficacy of transcatheter aortic ...valve implantation (TAVI) vs. surgical aortic valve replacement (SAVR) across the entire spectrum of surgical risk patients.
Methods and results
The meta-analysis is registered with PROSPERO (CRD42016037273). We identified RCTs comparing TAVI with SAVR in patients with severe aortic stenosis reporting at different follow-up periods. We extracted trial, patient, intervention, and outcome characteristics following predefined criteria. The primary outcome was all-cause mortality up to 2 years for the main analysis. Seven trials that randomly assigned 8020 participants to TAVI (4014 patients) and SAVR (4006 patients) were included. The combined mean STS score in the TAVI arm was 9.4%, 5.1%, and 2.0% for high-, intermediate-, and low surgical risk trials, respectively. Transcatheter aortic valve implantation was associated with a significant reduction of all-cause mortality compared to SAVR {hazard ratio HR 0.88 95% confidence interval (CI) 0.78–0.99, P = 0.030}; an effect that was consistent across the entire spectrum of surgical risk (P-for-interaction = 0.410) and irrespective of type of transcatheter heart valve (THV) system (P-for-interaction = 0.674). Transcatheter aortic valve implantation resulted in lower risk of strokes HR 0.81 (95% CI 0.68–0.98), P = 0.028. Surgical aortic valve replacement was associated with a lower risk of major vascular complications HR 1.99 (95% CI 1.34–2.93), P = 0.001 and permanent pacemaker implantations HR 2.27 (95% CI 1.47–3.64), P < 0.001 compared to TAVI.
Conclusion
Compared with SAVR, TAVI is associated with reduction in all-cause mortality and stroke up to 2 years irrespective of baseline surgical risk and type of THV system.
Network meta-analysis compares different interventions for the same condition, by combining direct and indirect evidence derived from all eligible studies. Network metaanalysis has been increasingly ...used by applied scientists and it is a major research topic for methodologists. This article describes the R package netmeta, which adopts frequentist methods to fit network meta-analysis models. We provide a roadmap to perform network meta-analysis, along with an overview of the main functions of the package. We present three worked examples considering different types of outcomes and different data formats to facilitate researchers aiming to conduct network meta-analysis with netmeta.
Network meta‐analysis (NMA) compares several interventions that are linked in a network of comparative studies and estimates the relative treatment effects between all treatments, using both direct ...and indirect evidence. NMA is increasingly used for decision making in health care, however, a user‐friendly system to evaluate the confidence that can be placed in the results of NMA is currently lacking. This paper is a tutorial describing the Confidence In Network Meta‐Analysis (CINeMA) web application, which is based on the framework developed by Salanti et al (2014, PLOS One, 9, e99682) and refined by Nikolakopoulou et al (2019, bioRxiv). Six domains that affect the level of confidence in the NMA results are considered: (a) within‐study bias, (b) reporting bias, (c) indirectness, (d) imprecision, (e) heterogeneity, and (f) incoherence. CINeMA is freely available and open‐source and no login is required. In the configuration step users upload their data, produce network plots and define the analysis and effect measure. The dataset should include assessments of study‐level risk of bias and judgments on indirectness. CINeMA calls the netmeta routine in R to estimate relative effects and heterogeneity. Users are then guided through a systematic evaluation of the six domains. In this way reviewers assess the level of concerns for each relative treatment effect from NMA as giving rise to “no concerns,” “some concerns,” or “major concerns” in each of the six domains, which are graphically summarized on the report page for all effect estimates. Finally, judgments across the domains are summarized into a single confidence rating (“high,” “moderate,” “low,” or “very low”). In conclusion, the user‐friendly web‐based CINeMA platform provides a transparent framework to evaluate evidence from systematic reviews with multiple interventions.
Pairwise and network meta-analysis (NMA) are traditionally used retrospectively to assess existing evidence. However, the current evidence often undergoes several updates as new studies become ...available. In each update recommendations about the conclusiveness of the evidence and the need of future studies need to be made. In the context of prospective meta-analysis future studies are planned as part of the accumulation of the evidence. In this setting, multiple testing issues need to be taken into account when the meta-analysis results are interpreted. We extend ideas of sequential monitoring of meta-analysis to provide a methodological framework for updating NMAs. Based on the z-score for each network estimate (the ratio of effect size to its standard error) and the respective information gained after each study enters NMA we construct efficacy and futility stopping boundaries. A NMA treatment effect is considered conclusive when it crosses an appended stopping boundary. The methods are illustrated using a recently published NMA where we show that evidence about a particular comparison can become conclusive via indirect evidence even if no further trials address this comparison.
Systematic reviews that employ network meta-analysis are undertaken and published with increasing frequency while related statistical methodology is evolving. Future statistical developments and ...evaluation of the existing methodologies could be motivated by the characteristics of the networks of interventions published so far in order to tackle real rather than theoretical problems. Based on the recently formed network meta-analysis literature we aim to provide an insight into the characteristics of networks in healthcare research. We searched PubMed until end of 2012 for meta-analyses that used any form of indirect comparison. We collected data from networks that compared at least four treatments regarding their structural characteristics as well as characteristics of their analysis. We then conducted a descriptive analysis of the various network characteristics. We included 186 networks of which 35 (19%) were star-shaped (treatments were compared to a common comparator but not between themselves). The median number of studies per network was 21 and the median number of treatments compared was 6. The majority (85%) of the non-star shaped networks included at least one multi-arm study. Synthesis of data was primarily done via network meta-analysis fitted within a Bayesian framework (113 (61%) networks). We were unable to identify the exact method used to perform indirect comparison in a sizeable number of networks (18 (9%)). In 32% of the networks the investigators employed appropriate statistical methods to evaluate the consistency assumption; this percentage is larger among recently published articles. Our descriptive analysis provides useful information about the characteristics of networks of interventions published the last 16 years and the methods for their analysis. Although the validity of network meta-analysis results highly depends on some basic assumptions, most authors did not report and evaluate them adequately. Reviewers and editors need to be aware of these assumptions and insist on their reporting and accuracy.
Selective outcome reporting and publication bias threaten the validity of systematic reviews and meta-analyses and can affect clinical decision-making. A rigorous method to evaluate the impact of ...this bias on the results of network meta-analyses of interventions is lacking. We present a tool to assess the Risk Of Bias due to Missing Evidence in Network meta-analysis (ROB-MEN).
ROB-MEN first evaluates the risk of bias due to missing evidence for each of the possible pairwise comparison that can be made between the interventions in the network. This step considers possible bias due to the presence of studies with unavailable results (within-study assessment of bias) and the potential for unpublished studies (across-study assessment of bias). The second step combines the judgements about the risk of bias due to missing evidence in pairwise comparisons with (i) the contribution of direct comparisons to the network meta-analysis estimates, (ii) possible small-study effects evaluated by network meta-regression, and (iii) any bias from unobserved comparisons. Then, a level of "low risk", "some concerns", or "high risk" for the bias due to missing evidence is assigned to each estimate, which is our tool's final output.
We describe the methodology of ROB-MEN step-by-step using an illustrative example from a published NMA of non-diagnostic modalities for the detection of coronary artery disease in patients with low risk acute coronary syndrome. We also report a full application of the tool on a larger and more complex published network of 18 drugs from head-to-head studies for the acute treatment of adults with major depressive disorder.
ROB-MEN is the first tool for evaluating the risk of bias due to missing evidence in network meta-analysis and applies to networks of all sizes and geometry. The use of ROB-MEN is facilitated by an R Shiny web application that produces the Pairwise Comparisons and ROB-MEN Table and is incorporated in the reporting bias domain of the CINeMA framework and software.
ObjectiveTo empirically explore the level of agreement of the treatment hierarchies from different ranking metrics in network meta-analysis (NMA) and to investigate how network characteristics ...influence the agreement.DesignEmpirical evaluation from re-analysis of NMA.Data232 networks of four or more interventions from randomised controlled trials, published between 1999 and 2015.MethodsWe calculated treatment hierarchies from several ranking metrics: relative treatment effects, probability of producing the best value p(BV) and the surface under the cumulative ranking curve (SUCRA). We estimated the level of agreement between the treatment hierarchies using different measures: Kendall’s τ and Spearman’s ρ correlation; and the Yilmaz τAP and Average Overlap, to give more weight to the top of the rankings. Finally, we assessed how the amount of the information present in a network affects the agreement between treatment hierarchies, using the average variance, the relative range of variance and the total sample size over the number of interventions of a network.ResultsOverall, the pairwise agreement was high for all treatment hierarchies obtained by the different ranking metrics. The highest agreement was observed between SUCRA and the relative treatment effect for both correlation and top-weighted measures whose medians were all equal to 1. The agreement between rankings decreased for networks with less precise estimates and the hierarchies obtained from pBV appeared to be the most sensitive to large differences in the variance estimates. However, such large differences were rare.ConclusionsDifferent ranking metrics address different treatment hierarchy problems, however they produced similar rankings in the published networks. Researchers reporting NMA results can use the ranking metric they prefer, unless there are imprecise estimates or large imbalances in the variance estimates. In this case treatment hierarchies based on both probabilistic and non-probabilistic ranking metrics should be presented.
Network meta-analysis estimates all relative effects between competing treatments and can produce a treatment hierarchy from the most to the least desirable option according to a health outcome. ...While about half of the published network meta-analyses present such a hierarchy, it is rarely the case that it is related to a clinically relevant decision question.
We first define treatment hierarchy and treatment ranking in a network meta-analysis and suggest a simulation method to estimate the probability of each possible hierarchy to occur. We then propose a stepwise approach to express clinically relevant decision questions as hierarchy questions and quantify the uncertainty of the criteria that constitute them. The steps of the approach are summarized as follows: a) a question of clinical relevance is defined, b) the hierarchies that satisfy the defined question are collected and c) the frequencies of the respective hierarchies are added; the resulted sum expresses the certainty of the defined set of criteria to hold. We then show how the frequencies of all possible hierarchies relate to common ranking metrics.
We exemplify the method and its implementation using two networks. The first is a network of four treatments for chronic obstructive pulmonary disease where the most probable hierarchy has a frequency of 28%. The second is a network of 18 antidepressants, among which Vortioxetine, Bupropion and Escitalopram occupy the first three ranks with frequency 19%.
The developed method offers a generalised approach of producing treatment hierarchies in network meta-analysis, which moves towards attaching treatment ranking to a clear decision question, relevant to all or a subset of competing treatments.
Antipsychotic medication can cause tardive dyskinesia (TD) - late-onset, involuntary, repetitive movements, often involving the face and tongue. TD occurs in > 20% of adults taking antipsychotic ...medication (first-generation antipsychotics for > 3 months), with this proportion increasing by 5% per year among those who continue to use these drugs. The incidence of TD among those taking newer antipsychotics is not different from the rate in people who have used older-generation drugs in moderate doses. Studies of TD have previously been found to be limited, with no treatment approach shown to be effective.
To summarise the clinical effectiveness and safety of treatments for TD by updating past Cochrane reviews with new evidence and improved methods; to undertake public consultation to gauge the importance of the topic for people living with TD/the risk of TD; and to make available all data from relevant trials.
All relevant randomised controlled trials (RCTs) and observational studies.
Cochrane review methods, network meta-analysis (NMA).
Systematic reviews, patient and public involvement consultation and NMA.
Any setting, inpatient or outpatient.
For systematic reviews, adults with TD who have been taking a stable antipsychotic drug dose for > 3 months.
Any, with emphasis on those relevant to UK NHS practice.
Any measure of TD, global assessments and adverse effects/events.
We included 112 studies (nine Cochrane reviews). Overall, risk of bias showed little sign of improvement over two decades. Taking the outcome of 'TD symptoms improved to a clinically important extent', we identified two trials investigating reduction of antipsychotic dose
= 17, risk ratio (RR) 0.42, 95% confidence interval (CI) 0.17 to 1.04; very low quality. Switching was investigated twice in trials that could not be combined (switching to risperidone vs. antipsychotic withdrawal: one RCT,
= 42, RR 0.45, 95% CI 0.23 to 0.89; low quality; switching to quetiapine vs. haloperidol: one RCT,
= 45, RR 0.80, 95% CI 0.52 to 1.22; low quality). In addition to RCTs, six observational studies compared antipsychotic discontinuation with decreased or increased dosage, and there was no clear evidence that any of these strategies had a beneficial effect on TD symptoms (very low-quality evidence). We evaluated the addition to standard antipsychotic care of several treatments, but not anticholinergic treatments, for which we identified no trials. We found no clear effect of the addition of either benzodiazepines (two RCTs,
= 32, RR 1.12, 95% CI 0.6 to 2.09; very low quality) or vitamin E (six RCTs,
= 264, RR 0.95, 95% CI 0.89 to 1.01; low quality). Buspirone as an adjunctive treatment did have some effect in one small study (
= 42, RR 0.53, 95% CI 0.33 to 0.84; low quality), as did hypnosis and relaxation (one RCT,
= 15, RR 0.45, 95% CI 0.21 to 0.94; very low quality). We identified no studies focusing on TD in people with dementia. The NMA model found indirect estimates to be imprecise and failed to produce useful summaries on relative effects of interventions or interpretable results for decision-making. Consultation with people with/at risk of TD highlighted that management of TD remains a concern, and found that people are deeply disappointed at the length of time it has taken researchers to address the issue.
Most studies remain small and poorly reported.
Clinicians, policy-makers and people with/at risk of TD are little better informed than they were decades ago. Underpowered trials of limited quality repeatedly fail to provide answers.
TD reviews have data from current trials extracted, tabulated and traceable to source. The NMA highlights one context in which support for this technique is ill advised. All relevant trials, even if not primarily addressing the issue of TD, should report appropriate binary outcomes on groups of people with this problem. Randomised trials of treatments for people with established TD are indicated. These should be large (> 800 participants), necessitating accrual through accurate local/national registers, including an intervention with acceptable treatments and recording outcomes used in clinical practice.
This study is registered as PROSPERO CRD4201502045.
The National Institute for Health Research Health Technology Assessment programme.