Abstract
Pleiotropy, the phenomenon of a single genetic variant influencing multiple traits, is likely widespread in the human genome. If pleiotropy arises because the single nucleotide polymorphism ...(SNP) influences one trait, which in turn influences another ('vertical pleiotropy'), then Mendelian randomization (MR) can be used to estimate the causal influence between the traits. Of prime focus among the many limitations to MR is the unprovable assumption that apparent pleiotropic associations are mediated by the exposure (i.e. reflect vertical pleiotropy), and do not arise due to SNPs influencing the two traits through independent pathways ('horizontal pleiotropy'). The burgeoning treasure trove of genetic associations yielded through genome wide association studies makes for a tantalizing prospect of phenome-wide causal inference. Recent years have seen substantial attention devoted to the problem of horizontal pleiotropy, and in this review we outline how newly developed methods can be used together to improve the reliability of MR.
Inference about the causal structure that induces correlations between two traits can be achieved by combining genetic associations with a mediation-based approach, as is done in the causal inference ...test (CIT). However, we show that measurement error in the phenotypes can lead to the CIT inferring the wrong causal direction, and that increasing sample sizes has the adverse effect of increasing confidence in the wrong answer. This problem is likely to be general to other mediation-based approaches. Here we introduce an extension to Mendelian randomisation, a method that uses genetic associations in an instrumentation framework, that enables inference of the causal direction between traits, with some advantages. First, it can be performed using only summary level data from genome-wide association studies; second, it is less susceptible to bias in the presence of measurement error or unmeasured confounding. We apply the method to infer the causal direction between DNA methylation and gene expression levels. Our results demonstrate that, in general, DNA methylation is more likely to be the causal factor, but this result is highly susceptible to bias induced by systematic differences in measurement error between the platforms, and by horizontal pleiotropy. We emphasise that, where possible, implementing MR and appropriate sensitivity analyses alongside other approaches such as CIT is important to triangulate reliable conclusions about causality.
Abstract
Mendelian randomization (MR) is gaining in recognition and popularity as a method for strengthening causal inference in epidemiology by utilizing genetic variants as instrumental variables. ...Concurrently with the explosion in empirical MR studies, there has been the steady production of new approaches for MR analysis. The recently proposed “global and individual tests for direct effects” (GLIDE) approach fits into a family of methods that aim to detect horizontal pleiotropy—at the individual single nucleotide polymorphism level and at the global level—and to adjust the analysis by removing outlying single nucleotide polymorphisms. In this commentary, we explain how existing methods can (and indeed are) being used to detect pleiotropy at the individual and global levels, although not explicitly using this terminology. By doing so, we show that the true comparator for GLIDE is not MR-Egger regression (as Dai et al., the authors of the accompanying article (Am J Epidemiol. 2018;187(12):2672–2680), claim) but rather the humble heterogeneity statistic.
Numerous observational studies have attempted to identify risk factors for infection with SARS-CoV-2 and COVID-19 disease outcomes. Studies have used datasets sampled from patients admitted to ...hospital, people tested for active infection, or people who volunteered to participate. Here, we highlight the challenge of interpreting observational evidence from such non-representative samples. Collider bias can induce associations between two or more variables which affect the likelihood of an individual being sampled, distorting associations between these variables in the sample. Analysing UK Biobank data, compared to the wider cohort the participants tested for COVID-19 were highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the mechanisms inducing these problems, and approaches that could help mitigate them. While collider bias should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.
Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the ...need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (<ext-link ext-link-type="uri" xlink:href="http://www.mrbase.org">http://www.mrbase.org</ext-link>): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.
The age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). ...In this study, we have analysed 162 PRS (p<5×10
) derived from GWAS and 551 heritable traits from the UK Biobank study (N = 334,398). Findings can be investigated using a web application (http://mrcieu.mrsoftware.org/PRS_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility. To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.
Large studies use genotype data to discover genetic contributions to complex traits and infer relationships between those traits. Co-incident geographical variation in genotypes and health traits can ...bias these analyses. Here we show that single genetic variants and genetic scores composed of multiple variants are associated with birth location within UK Biobank and that geographic structure in genotype data cannot be accounted for using routine adjustment for study centre and principal components derived from genotype data. We find that major health outcomes appear geographically structured and that coincident structure in health outcomes and genotype data can yield biased associations. Understanding and accounting for this phenomenon will be important when making inference from genotype data in large studies.
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and ...biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
In Mendelian randomization (MR) analysis, variants that exert horizontal pleiotropy are typically treated as a nuisance. However, they could be valuable in identifying alternative pathways to the ...traits under investigation. Here, we develop MR-TRYX, a framework that exploits horizontal pleiotropy to discover putative risk factors for disease. We begin by detecting outliers in a single exposure-outcome MR analysis, hypothesising they are due to horizontal pleiotropy. We search across hundreds of complete GWAS summary datasets to systematically identify other (candidate) traits that associate with the outliers. We develop a multi-trait pleiotropy model of the heterogeneity in the exposure-outcome analysis due to pathways through candidate traits. Through detailed investigation of several causal relationships, many pleiotropic pathways are uncovered with already established causal effects, validating the approach, but also alternative putative causal pathways. Adjustment for pleiotropic pathways reduces the heterogeneity across the analyses.
Smoking prevalence is higher amongst individuals with schizophrenia and depression compared with the general population. Mendelian randomisation (MR) can examine whether this association is causal ...using genetic variants identified in genome-wide association studies (GWAS).
We conducted two-sample MR to explore the bi-directional effects of smoking on schizophrenia and depression. For smoking behaviour, we used (1) smoking initiation GWAS from the GSCAN consortium and (2) we conducted our own GWAS of lifetime smoking behaviour (which captures smoking duration, heaviness and cessation) in a sample of 462690 individuals from the UK Biobank. We validated this instrument using positive control outcomes (e.g. lung cancer). For schizophrenia and depression we used GWAS from the PGC consortium.
There was strong evidence to suggest smoking is a risk factor for both schizophrenia (odds ratio (OR) 2.27, 95% confidence interval (CI) 1.67-3.08, p < 0.001) and depression (OR 1.99, 95% CI 1.71-2.32, p < 0.001). Results were consistent across both lifetime smoking and smoking initiation. We found some evidence that genetic liability to depression increases smoking (β = 0.091, 95% CI 0.027-0.155, p = 0.005) but evidence was mixed for schizophrenia (β = 0.022, 95% CI 0.005-0.038, p = 0.009) with very weak evidence for an effect on smoking initiation.
These findings suggest that the association between smoking, schizophrenia and depression is due, at least in part, to a causal effect of smoking, providing further evidence for the detrimental consequences of smoking on mental health.