Meta-analyses frequently include studies with small sample sizes. Researchers usually fail to account for sampling error in the reported within-study variances; they model the observed study-specific ...effect sizes with the within-study variances and treat these sample variances as if they were the true variances. However, this sampling error may be influential when sample sizes are small. This article illustrates that the sampling error may lead to substantial bias in meta-analysis results.
We conducted extensive simulation studies to assess the bias caused by sampling error. Meta-analyses with continuous and binary outcomes were simulated with various ranges of sample size and extents of heterogeneity. We evaluated the bias and the confidence interval coverage for five commonly-used effect sizes (i.e., the mean difference, standardized mean difference, odds ratio, risk ratio, and risk difference).
Sampling error did not cause noticeable bias when the effect size was the mean difference, but the standardized mean difference, odds ratio, risk ratio, and risk difference suffered from this bias to different extents. The bias in the estimated overall odds ratio and risk ratio was noticeable even when each individual study had more than 50 samples under some settings. Also, Hedges' g, which is a bias-corrected estimate of the standardized mean difference within studies, might lead to larger bias than Cohen's d in meta-analysis results.
Cautions are needed to perform meta-analyses with small sample sizes. The reported within-study variances may not be simply treated as the true variances, and their sampling error should be fully considered in such meta-analyses.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Publication bias is a serious problem in systematic reviews and meta-analyses, which can affect the validity and generalization of conclusions. Currently, approaches to dealing with publication bias ...can be distinguished into two classes: selection models and funnel-plot-based methods. Selection models use weight functions to adjust the overall effect size estimate and are usually employed as sensitivity analyses to assess the potential impact of publication bias. Punnel-plot-based methods include visual examination of a funnel plot, regression and rank tests, and the nonparametric trim and fill method. Although these approaches have been widely used in applications, measures for quantifying publication bias are seldom studied in the literature. Such measures can be used as a characteristic of a meta-analysis; also, they permit comparisons of publication biases between different meta-analyses. Egger's regression intercept may be considered as a candidate measure, but it lacks an intuitive interpretation. This article introduces a new measure, the skewness of the standardized deviates, to quantify publication bias. This measure describes the asymmetry of the collected studies' distribution. In addition, a new test for publication bias is derived based on the skewness. Large sample properties of the new measure are studied, and its performance is illustrated using simulations and three case studies.
Full text
Available for:
BFBNIB, DOBA, FSPLJ, FZAB, GIS, IJS, INZLJ, IZUM, KILJ, NLZOH, NMLJ, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UILJ, UKNU, UL, UM, UPUK, ZRSKP
Publication bias frequently appears in meta-analyses when the included studies’ results (e.g., p-values) influence the studies’ publication processes. Some unfavorable studies may be suppressed from ...publication, so the meta-analytic results may be biased toward an artificially favorable direction. Many statistical tests have been proposed to detect publication bias in recent two decades. However, they often make dramatically different assumptions about the cause of publication bias; therefore, they are usually powerful only in certain cases that support their particular assumptions, while their powers may be fairly low in many other cases. Although several simulation studies have been carried out to compare different tests’ powers under various situations, it is typically infeasible to justify the exact mechanism of publication bias in a real-world meta-analysis and thus select the corresponding optimal publication bias test. We introduce a hybrid test for publication bias by synthesizing various tests and incorporating their benefits, so that it maintains relatively high powers across various mechanisms of publication bias. The superior performance of the proposed hybrid test is illustrated using simulation studies and three real-world meta-analyses with different effect sizes. It is compared with many existing methods, including the commonly used regression and rank tests, and the trim-and-fill method.
Full text
Available for:
NUK, OILJ, SAZU, UKNU, UL, UM, UPUK
Epidemiologic research often involves meta-analyses of proportions. Conventional two-step methods first transform each study’s proportion and subsequently perform a meta-analysis on the transformed ...scale. They suffer from several important limitationsthe log and logit transformations impractically treat within-study variances as fixed, known values and require ad hoc corrections for zero counts; the results from arcsine-based transformations may lack interpretability. Generalized linear mixed models (GLMMs) have been recommended in meta-analyses as a one-step approach to fully accounting for within-study uncertainties. However, they are seldom used in current practice to synthesize proportions. This article summarizes various methods for meta-analyses of proportions, illustrates their implementations, and explores their performance using real and simulated datasets. In general, GLMMs led to smaller biases and mean squared errors and higher coverage probabilities than two-step methods. Many software programs are readily available to implement these methods.
Meta‐analyses have been increasingly used to synthesize proportions (eg, disease prevalence) from multiple studies in recent years. Arcsine‐based transformations, especially the Freeman–Tukey ...double‐arcsine transformation, are popular tools for stabilizing the variance of each study's proportion in two‐step meta‐analysis methods. Although they offer some benefits over the conventional logit transformation, they also suffer from several important limitations (eg, lack of interpretability) and may lead to misleading conclusions. Generalized linear mixed models and Bayesian models are intuitive one‐step alternative approaches, and can be readily implemented via many software programs. This article explains various pros and cons of the arcsine‐based transformations, and discusses the alternatives that may be generally superior to the currently popular practice.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
It is common to measure continuous outcomes using different scales (eg, quality of life, severity of anxiety or depression), therefore these outcomes need to be standardized before pooling in a ...meta-analysis. Common methods of standardization include using the standardized mean difference, the odds ratio derived from continuous data, the minimally important difference, and the ratio of means. Other ways of making data more meaningful to end users include transforming standardized effects back to original scales and transforming odds ratios to absolute effects using an assumed baseline risk. For these methods to be valid, the scales or instruments being combined across studies need to have assessed the same or a similar construct
Full text
Available for:
BFBNIB, CMK, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
With the growing concerns about research reproducibility and replicability, the assessment of scientific results' fragility (or robustness) has been of increasing interest. The fragility index was ...proposed to quantify the robustness of statistical significance of clinical studies with binary outcomes. It is defined as the minimal event status modifications that can alter statistical significance. It helps clinicians evaluate the reliability of the conclusions. Many factors may affect the fragility index, including the treatment groups in which event status is modified, the statistical methods used for testing for the association between treatments and outcomes, and the pre-specified significance level. In addition to assessing the fragility of individual studies, the fragility index was recently extended to both conventional pairwise meta-analyses and network meta-analyses of multiple treatment comparisons. It is not straightforward for clinicians to calculate these measures and visualize the results. We have developed an R package called "fragility" to offer user-friendly functions for such purposes. This article provides an overview of methods for assessing and visualizing the fragility of individual studies as well as pairwise and network meta-analyses, introduces the usage of the "fragility" package, and illustrates the implementations with several worked examples.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Network meta‐analysis (NMA) has become an increasingly used tool to compare multiple treatments simultaneously by synthesizing direct and indirect evidence in clinical research. However, many ...existing studies did not properly report the evidence of treatment comparisons and show the comparison structure to audience. In addition, nearly all treatment networks presented only direct evidence, not overall evidence that can reflect the benefit of performing NMAs. This article classifies treatment networks into three types under different assumptions; they include networks with each treatment comparison's edge width proportional to the corresponding number of studies, sample size, and precision. In addition, three new measures (ie, the effective number of studies, the effective sample size, and the effective precision) are proposed to preliminarily quantify overall evidence gained in NMAs. They permit audience to intuitively evaluate the benefit of performing NMAs, compared with pairwise meta‐analyses based on only direct evidence. We use four case studies, including one illustrative example, to demonstrate their derivations and interpretations. Treatment networks may look fairly differently when different measures are used to present the evidence. The proposed measures provide clear information about overall evidence of all treatment comparisons, and they also imply the additional number of studies, sample size, and precision obtained from indirect evidence. Some comparisons may benefit little from NMAs. Researchers are encouraged to present overall evidence of all treatment comparisons, so that audience can preliminarily evaluate the quality of NMAs.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Rationale Aims and Objectives
The fragility index (FI) and fragility quotient (FQ) are increasingly used measures for assessing the robustness of clinical studies with binary outcomes in terms of ...statistical significance. The FI is the minimum number of event status modifications that can alter a study result's statistical significance (or nonsignificance), and the FQ is calculated as the FI divided by the study's total sample size. The literature has no widely recognized criteria for interpreting the fragility measures' magnitudes. This article aims to provide an empirical assessment for the FI and FQ based on a large database of clinical studies in the Cochrane Library.
Methods
We explored the overall empirical distributions of the FI and FQ based on five common methods (Fisher's exact test, χ2 test, risk difference, odds ratio, and relative risk) for determining statistical significance of binary outcomes in clinical research. We also considered three different scenarios for the FI calculation and evaluated the relationship between p values and FIs or FQs using Spearman's ρ $\rho $. Finally, we summarized empirical thresholds based on the overall distributions of the FI and FQ to facilitate their interpretations in future research.
Results
For about 20% of studies with significant results, the statistical significance was changed after modifying the event status of only one participant. Studies with significant results were considered slightly fragile if the significance hinged on the statuses of about five events. Studies were extremely fragile if FI ≤ $\le $1 or FQ ≤ $\le $0.01. The FIs were strongly correlated with p values for significant studies, while Spearman's ρ $\rho $ varied according to the total sample sizes of studies.
Conclusions
The statistical significance of clinical studies could be changed after modifying a few events' statuses. Many studies' findings are fairly fragile. The distributions of the FI and FQ provide insights for appraising the robustness of evidence in clinical decision‐making.
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, SIK, UILJ, UKNU, UL, UM, UPUK
Network meta-analysis is a powerful approach for synthesizing direct and indirect evidence about multiple treatment comparisons from a collection of independent studies. At present, the most widely ...used method in network meta-analysis is contrast-based, in which a baseline treatment needs to be specified in each study, and the analysis focuses on modeling relative treatment effects (typically log odds ratios). However, population-averaged treatment-specific parameters, such as absolute risks, cannot be estimated by this method without an external data source or a separate model for a reference treatment. Recently, an arm-based network meta-analysis method has been proposed, and the R package
provides user-friendly functions for its implementation. This package estimates both absolute and relative effects, and can handle binary, continuous, and count outcomes.