Genetic variants can modulate phenotypic outcomes via epigenetic intermediates, for example at methylation quantitative trait loci (mQTL). We present the first large-scale assessment of mQTL at human ...genomic regions selected for interindividual variation in CpG methylation, which we call correlated regions of systemic interindividual variation (CoRSIVs). These can be assayed in blood DNA and do not reflect interindividual variation in cellular composition.
We use target-capture bisulfite sequencing to assess DNA methylation at 4086 CoRSIVs in multiple tissues from each of 188 donors in the NIH Gene-Tissue Expression (GTEx) program. At CoRSIVs, DNA methylation in peripheral blood correlates with methylation and gene expression in internal organs. We also discover unprecedented mQTL at these regions. Genetic influences on CoRSIV methylation are extremely strong (median R
=0.76), cumulatively comprising over 70-fold more human mQTL than detected in the most powerful previous study. Moreover, mQTL beta coefficients at CoRSIVs are highly skewed (i.e., the major allele predicts higher methylation). Both surprising findings are independently validated in a cohort of 47 non-GTEx individuals. Genomic regions flanking CoRSIVs show long-range enrichments for LINE-1 and LTR transposable elements; the skewed beta coefficients may therefore reflect evolutionary selection of genetic variants that promote their methylation and silencing. Analyses of GWAS summary statistics show that mQTL polymorphisms at CoRSIVs are associated with metabolic and other classes of disease.
A focus on systemic interindividual epigenetic variants, clearly enhanced in mQTL content, should likewise benefit studies attempting to link human epigenetic variation to the risk of disease.
DNA methylation is thought to be an important determinant of human phenotypic variation, but its inherent cell type specificity has impeded progress on this question. At exceptional genomic regions, ...interindividual variation in DNA methylation occurs systemically. Like genetic variants, systemic interindividual epigenetic variants are stable, can influence phenotype, and can be assessed in any easily biopsiable DNA sample. We describe an unbiased screen for human genomic regions at which interindividual variation in DNA methylation is not tissue-specific.
For each of 10 donors from the NIH Genotype-Tissue Expression (GTEx) program, CpG methylation is measured by deep whole-genome bisulfite sequencing of genomic DNA from tissues representing the three germ layer lineages: thyroid (endoderm), heart (mesoderm), and brain (ectoderm). We develop a computational algorithm to identify genomic regions at which interindividual variation in DNA methylation is consistent across all three lineages. This approach identifies 9926 correlated regions of systemic interindividual variation (CoRSIVs). These regions, comprising just 0.1% of the human genome, are inter-correlated over long genomic distances, associated with transposable elements and subtelomeric regions, conserved across diverse human ethnic groups, sensitive to periconceptional environment, and associated with genes implicated in a broad range of human disorders and phenotypes. CoRSIV methylation in one tissue can predict expression of associated genes in other tissues.
In addition to charting a previously unexplored molecular level of human individuality, this atlas of human CoRSIVs provides a resource for future population-based investigations into how interindividual epigenetic variation modulates risk of disease.
Abstract
Background
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG ...sites (differentially methylated regions). Variation limited to single or small numbers of CpGs has been assumed to reflect stochastic processes. To test this, we developed software, Cluster-Based analysis of CpG methylation (CluBCpG), and explored variation in read-level CpG methylation patterns in whole genome bisulfite sequencing data.
Results
Analysis of both human and mouse whole genome bisulfite sequencing datasets reveals read-level signatures associated with cell type and cell type-specific biological processes. These signatures, which are mostly orthogonal to classical differentially methylated regions, are enriched at cell type-specific enhancers and allow estimation of proportional cell composition in synthetic mixtures and improved prediction of gene expression. In tandem, we developed a machine learning algorithm, Precise Read-Level Imputation of Methylation (PReLIM), to increase coverage of existing whole genome bisulfite sequencing datasets by imputing CpG methylation states on individual sequencing reads. PReLIM both improves CluBCpG coverage and performance and enables identification of novel differentially methylated regions, which we independently validate.
Conclusions
Our data indicate that, rather than stochastic variation, read-level CpG methylation patterns in tissue whole genome bisulfite sequencing libraries reflect cell type. Accordingly, these new computational tools should lead to an improved understanding of epigenetic regulation by DNA methylation.
Perfluorooctanoic acid (PFOA) and perfluorooctane sulfonate (PFOS) are persistent organic pollutants which may alter prenatal development, potentially through epigenetic modifications. Prior studies ...examining PFOS/PFOA and DNA methylation have relatively few subjects (n < 200) and inconsistent results.
We examined relations of PFOA/PFOS with DNA methylation among 597 neonates in the Upstate KIDS cohort study. PFOA/PFOS were quantified in newborn dried blood spots (DBS) using high-performance liquid chromatography/tandem mass spectrometry. DNA methylation was measured using the Infinium MethylationEPIC BeadChip with DNA extracted from DBS. Robust linear regression was used to examine the associations of PFOA/PFOS with DNA methylation at individual CpG sites. Covariates included sample plate, estimated cell type, epigenetically derived ancestry, infant sex and plurality, indicators of maternal socioeconomic status, and prior pregnancy loss. In supplemental analysis, we restricted the analysis to 2242 CpG sites previously identified as Correlated Regions of Systemic Interindividual Variation (CoRSIVs) which include metastable epialleles.
At FDR<0.05, PFOA concentration >90th percentile was related to DNA methylation at cg15557840, near SCRT2, SRXN1; PFOS>90th percentile was related to 2 CpG sites in a sex-specific manner (cg19039925 in GVIN1 in boys and cg05754408 in ZNF26 in girls). When analysis was restricted to CoRSIVs, log-scaled, continuous PFOS concentration was related to DNA methylation at cg03278866 within PTBP1.
In conclusion, there was limited evidence of an association between high concentrations of PFOA/PFOS and DNA methylation in newborn DBS in the Upstate KIDS cohort. These findings merit replication in populations with a higher median concentration of PFOA/PFOS.
•PFOS and PFOA were measured in newborn dried blood spots (DBS).•DNA was extracted from DBS to study methylation.•High PFOA/PFOS (>90th percentile) is related to DNA methylation at select CpG sites.•Findings are limited and need replication in cohorts with higher PFOA/PFOS exposure.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Summary
Epicuticular waxes provide a hydrophobic barrier that protects land plants from environmental stresses. To elucidate the molecular functions of maize glossy mutants that reduce the ...accumulation of epicuticular waxes, eight non‐allelic glossy mutants were subjected to transcriptomic comparisons with their respective wild‐type siblings. Transcriptomic comparisons identified 2279 differentially expressed (DE) genes. Other glossy genes tended to be down‐regulated in glossy mutants; by contrast stress‐responsive pathways were induced in mutants. Gene co‐expression network (GCN) analysis found that glossy genes were clustered, suggestive of co‐regulation. Genes that potentially regulate the accumulation of glossy gene transcripts were identified via a pathway level co‐expression analysis. Expression data from diverse organs showed that maize glossy genes are generally active in young leaves, silks, and tassels, while largely inactive in seeds and roots. Through reverse genetics, a DE gene homologous to Arabidopsis CER8 and co‐expressed with known glossy genes was confirmed to participate in epicuticular wax accumulation. GCN data‐informed forward genetics approach enabled cloning of the gl14 gene, which encodes a putative membrane‐associated protein. Our results deepen understanding of the transcriptional regulation of the genes involved in the accumulation of epicuticular wax, and provide two maize glossy genes and a number of candidate genes for further characterization.
Significance Statement
Co‐expression, suggestive of co‐regulation, of the genes determining accumulation of waxes on the plant's outmost surface was found from gene expression analysis, which also identified transcription factors that are likely to participate in the transcriptional modulation of the wax pathway. With genes co‐expressed with known genes in the pathway, two unknown maize genes responsible for accumulation of surface waxes were experimentally confirmed, demonstrating a strategy to facilitate the identification of causal genes of certain traits.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Abstract
Epigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ ...challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a “risk distance” to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson
r
= 0.28,
P
= 1.28 × 10
−12
), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants.
A new era for epigenetic epidemiology Gunasekara, Chathura J; Waterland, Robert A
Epigenomics,
2019-November-01, 20191101, 2019-11-01, 2019-11-00, Volume:
11, Issue:
15
Journal Article
Peer reviewed
Open access
...since the first published use of the term in 2004 (2), there has been growing interest in the field of epigenetic epidemiology, defined as the study of the associations between interindividual ...epigenetic variation and risk of disease (3). Because of its long-term stability and ability to be assayed in minute quantities of DNA, nearly all epigenetic epidemiologic studies have focused on CpG methylation. A computational algorithm was developed to analyze deep whole-genome bisulfite-sequencing (WGBS) data on tissues representing all three germ layers (thyroid, heart and brain) from each of ten donors from the NIH Genotype-Tissue Expression (GTEx) project. Focusing on these regions, investigators can employ genomic DNA from easily obtainable tissues like peripheral blood to draw inferences about epigenetic regulation throughout the body - akin to genotyping sequence variants. Financial and competing interests disclosure R Waterland is supported by USDA/ARS (CRIS 3092-5-001-059), the Cancer Prevention and Research Institute of Texas (grant number RP170295), and the National Institutes of Health(grant number 1R01DK111522).
PAX8 is a key thyroid transcription factor implicated in thyroid gland differentiation and function, and
gene methylation is reported to be sensitive to the periconceptional environment. Using a ...novel recall-by-epigenotype study in Gambian children, we found that
hypomethylation at age 2 years is associated with a 21% increase in thyroid volume and an increase in free thyroxine (T4) at 5 to 8 years, the latter equivalent to 8.4% of the normal range. Free T4 was associated with a decrease in DXA-derived body fat and bone mineral density. Furthermore, offspring
methylation was associated with periconceptional maternal nutrition, and methylation variability was influenced by genotype, suggesting that sensitivity to environmental exposures may be under partial genetic control. Together, our results demonstrate a possible link between early environment,
gene methylation and thyroid gland development and function, with potential implications for early embryonic programming of thyroid-related health and disease.
Recent genome-wide association studies corroborate classical research on developmental programming indicating that obesity is primarily a neurodevelopmental disease strongly influenced by nutrition ...during critical ontogenic windows. Epigenetic mechanisms regulate neurodevelopment; however, little is known about their role in establishing and maintaining the brain's energy balance circuitry. We generated neuron and glia methylomes and transcriptomes from male and female mouse hypothalamic arcuate nucleus, a key site for energy balance regulation, at time points spanning the closure of an established critical window for developmental programming of obesity risk. We find that postnatal epigenetic maturation is markedly cell type and sex specific and occurs in genomic regions enriched for heritability of body mass index in humans. Our results offer a potential explanation for both the limited ontogenic windows for and sex differences in sensitivity to developmental programming of obesity and provide a rich resource for epigenetic analyses of developmental programming of energy balance.
In the era of genetics and genomics, the advent of big data is transforming the field of biology into a data-intensive discipline. Novel computational algorithms and software tools are in demand to ...address the data analysis challenges in this growing field. This dissertation comprises the development of a novel algorithm, web-based data analysis tools, and a data visualization platform. Triple Gene Mutual Interaction (TGMI) algorithm, presented in Chapter 2 is an innovative approach to identify key regulatory transcription factors (TFs) that govern a particular biological pathway or a process through interaction among three genes in a triple gene block, which consists of a pair of pathway genes and a TF. The identification of key TFs controlling a biological pathway or a process allows biologists to understand the complex regulatory mechanisms in living organisms. TF-Miner, presented in Chapter 3, is a high-throughput gene expression data analysis web application that was developed by integrating two highly efficient algorithms; TF-cluster and TF-Finder. TF-Cluster can be used to obtain collaborative TFs that coordinately control a biological pathway or a process using genome-wide expression data. On the other hand, TF-Finder can identify regulatory TFs involved in or associated with a specific biological pathway or a process using Adaptive Sparse Canonical Correlation Analysis (ASCCA). Chapter 4 presents ExactSearch; a suffix tree based motif search algorithm, implemented in a web-based tool. This tool can identify the locations of a set of motif sequences in a set of target promoter sequences. ExactSearch also provides the functionality to search for a set of motif sequences in flanking regions from 50 plant genomes, which we have incorporated into the web tool. Chapter 5 presents STTM JBrowse; a web-based RNA-Seq data visualization system built using the JBrowse open source platform. STTM JBrowse is a unified repository to share/produce visualizations created from large RNA-Seq datasets generated from a variety of model and crop plants in which miRNAs were destroyed using Short Tandem Target Mimic (STTM) Technology.