The influence of genetic variation on complex diseases is potentially mediated through a range of highly dynamic epigenetic processes exhibiting temporal variation during development and later life. ...Here we present a catalogue of the genetic influences on DNA methylation (methylation quantitative trait loci (mQTL)) at five different life stages in human blood: children at birth, childhood, adolescence and their mothers during pregnancy and middle age.
We show that genetic effects on methylation are highly stable across the life course and that developmental change in the genetic contribution to variation in methylation occurs primarily through increases in environmental or stochastic effects. Though we map a large proportion of the cis-acting genetic variation, a much larger component of genetic effects influencing methylation are acting in trans. However, only 7 % of discovered mQTL are trans-effects, suggesting that the trans component is highly polygenic. Finally, we estimate the contribution of mQTL to variation in complex traits and infer that methylation may have a causal role consistent with an infinitesimal model in which many methylation sites each have a small influence, amounting to a large overall contribution.
DNA methylation contains a significant heritable component that remains consistent across the lifespan. Our results suggest that the genetic component of methylation may have a causal role in complex traits. The database of mQTL presented here provide a rich resource for those interested in investigating the role of methylation in disease.
Background & Aims: Previous results from observational, interventional studies and in vitro experiments suggest that certain micronutrients possess anti-viral and immunomodulatory activities. In ...particular, it has been hypothesized that zinc, selenium, copper and vitamin K1 have strong potential for prophylaxis and treatment of COVID-19. We aimed to test whether genetically predicted Zn, Se, Cu or vitamin K1 levels have a causal effect on COVID-19 related outcomes, including risk of infection, hospitalization and critical illness. Methods: We employed a two-sample Mendelian Randomization (MR) analysis. Our genetic variants derived from European-ancestry GWAS reflected circulating levels of Zn, Cu, Se in red blood cells as well as Se and vitamin K1 in serum/plasma. For the COVID-19 outcome GWAS, we used infection, hospitalization or critical illness. Our inverse-variance weighted (IVW) MR analysis was complemented by sensitivity analyses including a more liberal selection of variants at a genome-wide sub-significant threshold, MR-Egger and weighted median/mode tests. Results: Circulating micronutrient levels show limited evidence of association with COVID-19 infection, with the odds ratio OR ranging from 0.97 (95% CI: 0.87–1.08, p-value = 0.55) for zinc to 1.07 (95% CI: 1.00–1.14, p-value = 0.06)—i.e., no beneficial effect for copper was observed per 1 SD increase in exposure. Similarly minimal evidence was obtained for the hospitalization and critical illness outcomes with OR from 0.98 (95% CI: 0.87–1.09, p-value = 0.66) for vitamin K1 to 1.07 (95% CI: 0.88–1.29, p-value = 0.49) for copper, and from 0.93 (95% CI: 0.72–1.19, p-value = 0.55) for vitamin K1 to 1.21 (95% CI: 0.79–1.86, p-value = 0.39) for zinc, respectively. Conclusions: This study does not provide evidence that supplementation with zinc, selenium, copper or vitamin K1 can prevent SARS-CoV-2 infection, critical illness or hospitalization for COVID-19.
Spending more time active (and less sedentary) is associated with health benefits such as improved cardiovascular health and lower risk of all-cause mortality. It is unclear whether these ...associations differ depending on whether time spent sedentary or in moderate-vigorous physical activity (MVPA) is accumulated in long or short bouts. In this study, we used a novel method that accounts for substitution (i.e., more time in MVPA means less time sleeping, in light activity or sedentary) to examine whether length of sedentary and MVPA bouts associates with all-cause mortality.
We used data on 79,503 adult participants from the population-based UK Biobank cohort, which recruited participants between 2006 and 2010 (mean age at accelerometer wear 62.1 years SD = 7.9, 54.5% women; mean length of follow-up 5.1 years SD = 0.73). We derived (1) the total time participants spent in activity categories-sleep, sedentary, light activity, and MVPA-on average per day; (2) time spent in sedentary bouts of short (1 to 15 minutes), medium (16 to 40 minutes), and long (41+ minutes) duration; and (3) MVPA bouts of very short (1 to 9 minutes), short (10 to 15 minutes), medium (16 to 40 minutes), and long (41+ minutes) duration. We used Cox proportion hazards regression to estimate the association of spending 10 minutes more average daily time in one activity or bout length category, coupled with 10 minutes less time in another, with all-cause mortality. Those spending more time in MVPA had lower mortality risk, irrespective of whether this replaced time spent sleeping, sedentary, or in light activity, and these associations were of similar magnitude (e.g., hazard ratio HR 0.96 95% CI: 0.94, 0.97; P < 0.001 per 10 minutes more MVPA, coupled with 10 minutes less light activity per day). Those spending more time sedentary had higher mortality risk if this replaced light activity (HR 1.02 95% CI: 1.01, 1.02; P < 0.001 per 10 minutes more sedentary time, with 10 minutes less light activity per day) and an even higher risk if this replaced MVPA (HR 1.06 95% CI: 1.05, 1.08; P < 0.001 per 10 minutes more sedentary time, with 10 minutes less MVPA per day). We found little evidence that mortality risk differed depending on the length of sedentary or MVPA bouts. Key limitations of our study are potential residual confounding, the limited length of follow-up, and use of a select sample of the United Kingdom population.
We have shown that time spent in MVPA was associated with lower mortality, irrespective of whether it replaced time spent sleeping, sedentary, or in light activity. Time spent sedentary was associated with higher mortality risk, particularly if it replaced MVPA. This emphasises the specific importance of MVPA. Our findings suggest that the impact of MVPA does not differ depending on whether it is obtained from several short bouts or fewer longer bouts, supporting the recent removal of the requirement that MVPA should be accumulated in bouts of 10 minutes or more from the UK and the United States policy. Further studies are needed to investigate causality and explore health outcomes beyond mortality.
Cardiovascular disease (including coronary artery disease and myocardial infarction) is one of the leading causes of death in Europe, and is influenced by both environmental and genetic factors. With ...the recent advances in genomic tools and technologies there is potential to predict and diagnose heart disease using molecular data from analysis of blood cells. We analyzed gene expression data from blood samples taken from normal people (n = 21), non-significant coronary artery disease (n = 93), patients with unstable angina (n = 16), stable coronary artery disease (n = 14) and myocardial infarction (MI; n = 207). We used a feature selection approach to identify a set of gene expression variables which successfully differentiate different cardiovascular diseases. The initial features were discovered by fitting a linear model for each probe set across all arrays of normal individuals and patients with myocardial infarction. Three different feature optimisation algorithms were devised which identified two discriminating sets of genes, one using MI and normal controls (total genes = 6) and another one using MI and unstable angina patients (total genes = 7). In all our classification approaches we used a non-parametric k-nearest neighbour (KNN) classification method (k = 3). The results proved the diagnostic robustness of the final feature sets in discriminating patients with myocardial infarction from healthy controls. Interestingly it also showed efficacy in discriminating myocardial infarction patients from patients with clinical symptoms of cardiac ischemia but no myocardial necrosis or stable coronary artery disease, despite the influence of batch effects and different microarray gene chips and platforms.
The frequency of a haplotype comprising one allele at each of two loci can be expressed as a cubic equation (the 'Hill equation'), the solution of which gives that frequency. Most haplotype and ...linkage disequilibrium analysis programs use iteration-based algorithms which substitute an estimate of haplotype frequency into the equation, producing a new estimate which is repeatedly fed back into the equation until the values converge to a maximum likelihood estimate (expectation-maximisation).
We present a program, "CubeX", which calculates the biologically possible exact solution(s) and provides estimated haplotype frequencies, D', r2 and chi2 values for each. CubeX provides a "complete" analysis of haplotype frequencies and linkage disequilibrium for a pair of biallelic markers under situations where sampling variation and genotyping errors distort sample Hardy-Weinberg equilibrium, potentially causing more than one biologically possible solution. We also present an analysis of simulations and real data using the algebraically exact solution, which indicates that under perfect sample Hardy-Weinberg equilibrium there is only one biologically possible solution, but that under other conditions there may be more.
Our analyses demonstrate that lower allele frequencies, lower sample numbers, population stratification and a possible |D'| value of 1 are particularly susceptible to distortion of sample Hardy-Weinberg equilibrium, which has significant implications for calculation of linkage disequilibrium in small sample sizes (eg HapMap) and rarer alleles (eg paucimorphisms, q < 0.05) that may have particular disease relevance and require improved approaches for meaningful evaluation.
In Mendelian randomization (MR) analysis, variants that exert horizontal pleiotropy are typically treated as a nuisance. However, they could be valuable in identifying alternative pathways to the ...traits under investigation. Here, we develop MR-TRYX, a framework that exploits horizontal pleiotropy to discover putative risk factors for disease. We begin by detecting outliers in a single exposure-outcome MR analysis, hypothesising they are due to horizontal pleiotropy. We search across hundreds of complete GWAS summary datasets to systematically identify other (candidate) traits that associate with the outliers. We develop a multi-trait pleiotropy model of the heterogeneity in the exposure-outcome analysis due to pathways through candidate traits. Through detailed investigation of several causal relationships, many pleiotropic pathways are uncovered with already established causal effects, validating the approach, but also alternative putative causal pathways. Adjustment for pleiotropic pathways reduces the heterogeneity across the analyses.
ABSTRACT
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole‐genome/whole‐exome ...sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state‐of‐the‐art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web‐based implementation of FATHMM, including a high‐throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.
Maternal smoking during pregnancy has been found to influence newborn DNA methylation in genes involved in fundamental developmental processes. It is pertinent to understand the degree to which the ...offspring methylome is sensitive to the intensity and duration of prenatal smoking. An investigation of the persistence of offspring methylation associated with maternal smoking and the relative roles of the intrauterine and postnatal environment is also warranted. In the Avon Longitudinal Study of Parents and Children, we investigated associations between prenatal exposure to maternal smoking and offspring DNA methylation at multiple time points in approximately 800 mother-offspring pairs. In cord blood, methylation at 15 CpG sites in seven gene regions (AHRR, MYO1G, GFI1, CYP1A1, CNTNAP2, KLF13 and ATP9A) was associated with maternal smoking, and a dose-dependent response was observed in relation to smoking duration and intensity. Longitudinal analysis of blood DNA methylation in serial samples at birth, age 7 and 17 years demonstrated that some CpG sites showed reversibility of methylation (GFI1, KLF13 and ATP9A), whereas others showed persistently perturbed patterns (AHRR, MYO1G, CYP1A1 and CNTNAP2). Of those showing persistence, we explored the effect of postnatal smoke exposure and found that the major contribution to altered methylation was attributed to a critical window of in utero exposure. A comparison of paternal and maternal smoking and offspring methylation showed consistently stronger maternal associations, providing further evidence for causal intrauterine mechanisms. These findings emphasize the sensitivity of the methylome to maternal smoking during early development and the long-term impact of such exposure.
Abstract
Motivation
Next-generation sequencing technologies have accelerated the discovery of single nucleotide variants in the human genome, stimulating the development of predictors for classifying ...which of these variants are likely functional in disease, and which neutral. Recently, we proposed CScape, a method for discriminating between cancer driver mutations and presumed benign variants. For the neutral class, this method relied on benign germline variants found in the 1000 Genomes Project database. Discrimination could, therefore, be influenced by the distinction of germline versus somatic, rather than neutral versus disease driver. This motivates this article in which we consider predictive discrimination between recurrent and rare somatic single point mutations based solely on using cancer data, and the distinction between these two somatic classes and germline single point mutations.
Results
For somatic point mutations in coding and non-coding regions of the genome, we propose CScape-somatic, an integrative classifier for predictively discriminating between recurrent and rare variants in the human cancer genome. In this study, we use purely cancer genome data and investigate the distinction between minimal occurrence and significantly recurrent somatic single point mutations in the human cancer genome. We show that this type of predictive distinction can give novel insight, and may deliver more meaningful prediction in both coding and non-coding regions of the cancer genome. Tested on somatic mutations, CScape-somatic outperforms alternative methods, reaching 74% balanced accuracy in coding regions and 69% in non-coding regions, whereas even higher accuracy may be achieved using thresholds to isolate high-confidence predictions.
Availability and implementation
Predictions and software are available at http://CScape-somatic.biocompute.org.uk/.
Contact
mark.f.rogers.phd@gmail.com or C.Campbell@bristol.ac.uk
Supplementary information
Supplementary data are available at Bioinformatics online.
Osteoarthritis is the most common musculoskeletal disease and the leading cause of disability globally. Here, we performed a genome-wide association study for osteoarthritis (77,052 cases and 378,169 ...controls), analyzing four phenotypes: knee osteoarthritis, hip osteoarthritis, knee and/or hip osteoarthritis, and any osteoarthritis. We discovered 64 signals, 52 of them novel, more than doubling the number of established disease loci. Six signals fine-mapped to a single variant. We identified putative effector genes by integrating expression quantitative trait loci (eQTL) colocalization, fine-mapping, and human rare-disease, animal-model, and osteoarthritis tissue expression data. We found enrichment for genes underlying monogenic forms of bone development diseases, and for the collagen formation and extracellular matrix organization biological pathways. Ten of the likely effector genes, including TGFB1 (transforming growth factor beta 1), FGF18 (fibroblast growth factor 18), CTSK (cathepsin K), and IL11 (interleukin 11), have therapeutics approved or in clinical trials, with mechanisms of action supportive of evaluation for efficacy in osteoarthritis.