Abstract
Pleiotropy, the phenomenon of a single genetic variant influencing multiple traits, is likely widespread in the human genome. If pleiotropy arises because the single nucleotide polymorphism ...(SNP) influences one trait, which in turn influences another ('vertical pleiotropy'), then Mendelian randomization (MR) can be used to estimate the causal influence between the traits. Of prime focus among the many limitations to MR is the unprovable assumption that apparent pleiotropic associations are mediated by the exposure (i.e. reflect vertical pleiotropy), and do not arise due to SNPs influencing the two traits through independent pathways ('horizontal pleiotropy'). The burgeoning treasure trove of genetic associations yielded through genome wide association studies makes for a tantalizing prospect of phenome-wide causal inference. Recent years have seen substantial attention devoted to the problem of horizontal pleiotropy, and in this review we outline how newly developed methods can be used together to improve the reliability of MR.
Inference about the causal structure that induces correlations between two traits can be achieved by combining genetic associations with a mediation-based approach, as is done in the causal inference ...test (CIT). However, we show that measurement error in the phenotypes can lead to the CIT inferring the wrong causal direction, and that increasing sample sizes has the adverse effect of increasing confidence in the wrong answer. This problem is likely to be general to other mediation-based approaches. Here we introduce an extension to Mendelian randomisation, a method that uses genetic associations in an instrumentation framework, that enables inference of the causal direction between traits, with some advantages. First, it can be performed using only summary level data from genome-wide association studies; second, it is less susceptible to bias in the presence of measurement error or unmeasured confounding. We apply the method to infer the causal direction between DNA methylation and gene expression levels. Our results demonstrate that, in general, DNA methylation is more likely to be the causal factor, but this result is highly susceptible to bias induced by systematic differences in measurement error between the platforms, and by horizontal pleiotropy. We emphasise that, where possible, implementing MR and appropriate sensitivity analyses alongside other approaches such as CIT is important to triangulate reliable conclusions about causality.
Abstract
Mendelian randomization (MR) is gaining in recognition and popularity as a method for strengthening causal inference in epidemiology by utilizing genetic variants as instrumental variables. ...Concurrently with the explosion in empirical MR studies, there has been the steady production of new approaches for MR analysis. The recently proposed “global and individual tests for direct effects” (GLIDE) approach fits into a family of methods that aim to detect horizontal pleiotropy—at the individual single nucleotide polymorphism level and at the global level—and to adjust the analysis by removing outlying single nucleotide polymorphisms. In this commentary, we explain how existing methods can (and indeed are) being used to detect pleiotropy at the individual and global levels, although not explicitly using this terminology. By doing so, we show that the true comparator for GLIDE is not MR-Egger regression (as Dai et al., the authors of the accompanying article (Am J Epidemiol. 2018;187(12):2672–2680), claim) but rather the humble heterogeneity statistic.
Abstract
Background
Mendelian randomization (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved ...confounding, by utilizing genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome.
Methods and results
We use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single-sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK Biobank to estimate the effect of education and cognitive ability on body mass index.
Conclusion
MVMR analysis consistently estimates the direct causal effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual- or summary-level data.
We investigate the behavior of the Lasso for selecting invalid instruments in linear instrumental variables models for estimating causal effects of exposures on outcomes, as proposed recently by Kang ...et al. Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. We show that for this setup, the Lasso may not consistently select the invalid instruments if these are relatively strong. We propose a median estimator that is consistent when less than 50% of the instruments are invalid, and its consistency does not depend on the relative strength of the instruments, or their correlation structure. We show that this estimator can be used for adaptive Lasso estimation, with the resulting estimator having oracle properties. The methods are applied to a Mendelian randomization study to estimate the causal effect of body mass index (BMI) on diastolic blood pressure, using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI. Supplementary materials for this article are available online.
Abstract
In the last third of the 20th century, etiological epidemiology within academia in high-income countries shifted its primary concern from attempting to tackle the apparent epidemic of ...noncommunicable diseases to an increasing focus on developing statistical and causal inference methodologies. This move was mutually constitutive with the failure of applied epidemiology to make major progress, with many of the advances in understanding the causes of noncommunicable diseases coming from outside the discipline, while ironically revealing the infectious origins of several major conditions. Conversely, there were many examples of epidemiologic studies promoting ineffective interventions and little evident attempt to account for such failure. Major advances in concrete understanding of disease etiology have been driven by a willingness to learn about and incorporate into epidemiology developments in biology and cognate data science disciplines. If fundamental epidemiologic principles regarding the rooting of disease risk within populations are retained, recent methodological developments combined with increased biological understanding and data sciences capability should herald a fruitful post–Modern Epidemiology world.
The past decade has been proclaimed as a hugely successful era of gene discovery through the high yields of many genome-wide association studies (GWAS). However, much of the perceived benefit of such ...discoveries lies in the promise that the identification of genes that influence disease would directly translate into the identification of potential therapeutic targets, but this has yet to be realized at a level reflecting expectation. One reason for this, we suggest, is that GWAS, to date, have generally not focused on phenotypes that directly relate to the progression of disease and thus speak to disease treatment.
Numerous observational studies have attempted to identify risk factors for infection with SARS-CoV-2 and COVID-19 disease outcomes. Studies have used datasets sampled from patients admitted to ...hospital, people tested for active infection, or people who volunteered to participate. Here, we highlight the challenge of interpreting observational evidence from such non-representative samples. Collider bias can induce associations between two or more variables which affect the likelihood of an individual being sampled, distorting associations between these variables in the sample. Analysing UK Biobank data, compared to the wider cohort the participants tested for COVID-19 were highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the mechanisms inducing these problems, and approaches that could help mitigate them. While collider bias should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.
AbstractObjectiveTo evaluate whether body size in early life has an independent effect on risk of disease in later life or whether its influence is mediated by body size in adulthood.DesignTwo sample ...univariable and multivariable mendelian randomisation.SettingThe UK Biobank prospective cohort study and four large scale genome-wide association studies (GWAS) consortiums.Participants453 169 participants enrolled in UK Biobank and a combined total of more than 700 000 people from different GWAS consortiums.ExposuresMeasured body mass index during adulthood (mean age 56.5) and self-reported perceived body size at age 10.Main outcome measuresCoronary artery disease, type 2 diabetes, breast cancer, and prostate cancer.ResultsHaving a larger genetically predicted body size in early life was associated with an increased odds of coronary artery disease (odds ratio 1.49 for each change in body size category unless stated otherwise, 95% confidence interval 1.33 to 1.68) and type 2 diabetes (2.32, 1.76 to 3.05) based on univariable mendelian randomisation analyses. However, little evidence was found of a direct effect (ie, not through adult body size) based on multivariable mendelian randomisation estimates (coronary artery disease: 1.02, 0.86 to 1.22; type 2 diabetes:1.16, 0.74 to 1.82). In the multivariable mendelian randomisation analysis of breast cancer risk, strong evidence was found of a protective direct effect for larger body size in early life (0.59, 0.50 to 0.71), with less evidence of a direct effect of adult body size on this outcome (1.08, 0.93 to 1.27). Including age at menarche as an additional exposure provided weak evidence of a total causal effect (univariable mendelian randomisation odds ratio 0.98, 95% confidence interval 0.91 to 1.06) but strong evidence of a direct causal effect, independent of early life and adult body size (multivariable mendelian randomisation odds ratio 0.90, 0.85 to 0.95). No strong evidence was found of a causal effect of either early or later life measures on prostate cancer (early life body size odds ratio 1.06, 95% confidence interval 0.81 to 1.40; adult body size 0.87, 0.70 to 1.08).ConclusionsThe findings suggest that the positive association between body size in childhood and risk of coronary artery disease and type 2 diabetes in adulthood can be attributed to individuals remaining large into later life. However, having a smaller body size during childhood might increase the risk of breast cancer regardless of body size in adulthood, with timing of puberty also putatively playing a role.