Epidemiologists aim to identify modifiable causes of disease, this often being a prerequisite for the application of epidemiological findings in public health programmes, health service planning and ...clinical medicine. Despite successes in identifying causes, it is often claimed that there are missing additional causes for even reasonably well-understood conditions such as lung cancer and coronary heart disease. Several lines of evidence suggest that largely chance events, from the biographical down to the sub-cellular, contribute an important stochastic element to disease risk that is not epidemiologically tractable at the individual level. Epigenetic influences provide a fashionable contemporary explanation for such seemingly random processes. Chance events-such as a particular lifelong smoker living unharmed to 100 years-are averaged out at the group level. As a consequence population-level differences (for example, secular trends or differences between administrative areas) can be entirely explicable by causal factors that appear to account for only a small proportion of individual-level risk. In public health terms, a modifiable cause of the large majority of cases of a disease may have been identified, with a wild goose chase continuing in an attempt to discipline the random nature of the world with respect to which particular individuals will succumb. The quest for personalized medicine is a contemporary manifestation of this dream. An evolutionary explanation of why randomness exists in the development of organisms has long been articulated, in terms of offering a survival advantage in changing environments. Further, the basic notion that what is near-random at one level may be almost entirely predictable at a higher level is an emergent property of many systems, from particle physics to the social sciences. These considerations suggest that epidemiological approaches will remain fruitful as we enter the decade of the epigenome.
Mendelian randomization (MR) is an approach that uses genetic variants associated with a modifiable exposure or biological intermediate to estimate the causal relationship between these variables and ...a medically relevant outcome. Although it was initially developed to examine the relationship between modifiable exposures biomarkers and disease, its use has expanded to encompass applications in molecular epidemiology, systems biology, pharmacogenomics, and many other areas. The purpose of this review is to introduce MR, the principles behind the approach, and its limitations. We consider some of the new applications of the methodology, including informing drug development, and comment on some promising extensions, including two-step, two-sample, and bidirectional MR. We show how these new methods can be combined to efficiently examine causality in complex biological networks and provide a new framework to data mine high-dimensional studies as we transition into the age of hypothesis-free causality.
Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the ...need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (<ext-link ext-link-type="uri" xlink:href="http://www.mrbase.org">http://www.mrbase.org</ext-link>): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.
The age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). ...In this study, we have analysed 162 PRS (p<5×10
) derived from GWAS and 551 heritable traits from the UK Biobank study (N = 334,398). Findings can be investigated using a web application (http://mrcieu.mrsoftware.org/PRS_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility. To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.
DNA methylation data have become a valuable source of information for biomarker development, because, unlike static genetic risk estimates, DNA methylation varies dynamically in relation to diverse ...exogenous and endogenous factors, including environmental risk factors and complex disease pathology. Reliable methods for genome-wide measurement at scale have led to the proliferation of epigenome-wide association studies and subsequently to the development of DNA methylation-based predictors across a wide range of health-related applications, from the identification of risk factors or exposures, such as age and smoking, to early detection of disease or progression in cancer, cardiovascular and neurological disease. This Review evaluates the progress of existing DNA methylation-based predictors, including the contribution of machine learning techniques, and assesses the uptake of key statistical best practices needed to ensure their reliable performance, such as data-driven feature selection, elimination of data leakage in performance estimates and use of generalizable, adequately powered training samples.
Abstract
Background
Summary data furnishing a two-sample Mendelian randomization (MR) study are often visualized with the aid of a scatter plot, in which single-nucleotide polymorphism (SNP)–outcome ...associations are plotted against the SNP–exposure associations to provide an immediate picture of the causal-effect estimate for each individual variant. It is also convenient to overlay the standard inverse-variance weighted (IVW) estimate of causal effect as a fitted slope, to see whether an individual SNP provides evidence that supports, or conflicts with, the overall consensus. Unfortunately, the traditional scatter plot is not the most appropriate means to achieve this aim whenever SNP–outcome associations are estimated with varying degrees of precision and this is reflected in the analysis.
Methods
We propose instead to use a small modification of the scatter plot—the Galbraith Radial plot—for the presentation of data and results from an MR study, which enjoys many advantages over the original method. On a practical level, it removes the need to recode the genetic data and enables a more straightforward detection of outliers and influential data points. Its use extends beyond the purely aesthetic, however, to suggest a more general modelling framework to operate within when conducting an MR study, including a new form of MR-Egger regression.
Results
We illustrate the methods using data from a two-sample MR study to probe the causal effect of systolic blood pressure on coronary heart disease risk, allowing for the possible effects of pleiotropy. The Radial plot is shown to aid the detection of a single outlying variant that is responsible for large differences between IVW and MR-Egger regression estimates. Several additional plots are also proposed for informative data visualization.
Conclusions
The Radial plot should be considered in place of the scatter plot for visualizing, analysing and interpreting data from a two-sample summary data MR study. Software is provided to help facilitate its use.
IMPORTANCE: Acetaminophen (paracetamol) is used by a large proportion of pregnant women. Research suggests that acetaminophen use in pregnancy is associated with abnormal fetal neurodevelopment. ...However, it is possible that this association might be confounded by unmeasured behavioral factors linked to acetaminophen use. OBJECTIVE: To examine associations between offspring behavioral problems and (1) maternal prenatal acetaminophen use, (2) maternal postnatal acetaminophen use, and (3) partner’s acetaminophen use. DESIGN, SETTING, AND PARTICIPANTS: From February 2015 to March 2016, we collected and analyzed data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a prospective birth cohort. We studied 7796 mothers enrolled in ALSPAC between 1991 and 1992 along with their children and partners. EXPOSURES: Acetaminophen use was assessed by questionnaire completion at 18 and 32 weeks of pregnancy and when the child was 61 months old. MAIN OUTCOMES AND MEASURES: Maternal reports of behavioral problems using the Strengths and Difficulties Questionnaire (SDQ) when the children were 7 years old. We estimated risk ratios for behavioral problems in children after prenatal, postnatal, and partner’s exposure to acetaminophen and mutually adjusted each association. RESULTS: Maternal prenatal acetaminophen use at 18 (n = 4415; 53%) and 32 weeks of pregnancy (n = 3381; 42%) was associated with higher odds of having conduct problems (risk ratio RR, 1.42; 95% CI, 1.25-1.62) and hyperactivity symptoms (RR, 1.31; 95% CI, 1.16-1.49), while maternal acetaminophen use at 32 weeks was also associated with higher odds of having emotional symptoms (RR, 1.29; 95% CI, 1.09-1.53) and total difficulties (RR, 1.46; 95% CI, 1.21-1.77). This was not the case for maternal postnatal (n = 6916; 89%) or partner’s (n = 3454; 84%) acetaminophen use. We found the associations between maternal prenatal acetaminophen use and all the SDQ domains unchanged even after adjusting for maternal postnatal or partner’s acetaminophen use. CONCLUSIONS AND RELEVANCE: Children exposed to acetaminophen prenatally are at increased risk of multiple behavioral difficulties, and the associations do not appear to be explained by unmeasured behavioral or social factors linked to acetaminophen use insofar as they are not observed for postnatal or partner’s acetaminophen use. Although these results could have implications for public health advice, further studies are required to replicate the findings and to understand mechanisms.
Large studies such as UK Biobank are increasingly used for GWAS and Mendelian randomization (MR) studies. However, selection into and dropout from studies may bias genetic and phenotypic ...associations. We examine genetic factors affecting participation in four optional components in up to 451,306 UK Biobank participants. We used GWAS to identify genetic variants associated with participation, MR to estimate effects of phenotypes on participation, and genetic correlations to compare participation bias across different studies. 32 variants were associated with participation in one of the optional components (P < 6 × 10
), including loci with links to intelligence and Alzheimer's disease. Genetic correlations demonstrated that participation bias was common across studies. MR showed that longer educational duration, older menarche and taller stature increased participation, whilst higher levels of adiposity, dyslipidaemia, neuroticism, Alzheimer's and schizophrenia reduced participation. Our effect estimates can be used for sensitivity analysis to account for selective participation biases in genetic or non-genetic analyses.