We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious ...findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.
The methylome is subject to genetic and environmental effects. Their impact may depend on sex and age, resulting in sex- and age-related physiological variation and disease susceptibility. Here we ...estimate the total heritability of DNA methylation levels in whole blood and estimate the variance explained by common single nucleotide polymorphisms at 411,169 sites in 2,603 individuals from twin families, to establish a catalogue of between-individual variation in DNA methylation. Heritability estimates vary across the genome (mean=19%) and interaction analyses reveal thousands of sites with sex-specific heritability as well as sites where the environmental variance increases with age. Integration with previously published data illustrates the impact of genome and environment across the lifespan at methylation sites associated with metabolic traits, smoking and ageing. These findings demonstrate that our catalogue holds valuable information on locations in the genome where methylation variation between people may reflect disease-relevant environmental exposures or genetic variation.
Epigenetic change is a hallmark of ageing but its link to ageing mechanisms in humans remains poorly understood. While DNA methylation at many CpG sites closely tracks chronological age, DNA ...methylation changes relevant to biological age are expected to gradually dissociate from chronological age, mirroring the increased heterogeneity in health status at older ages.
Here, we report on the large-scale identification of 6366 age-related variably methylated positions (aVMPs) identified in 3295 whole blood DNA methylation profiles, 2044 of which have a matching RNA-seq gene expression profile. aVMPs are enriched at polycomb repressed regions and, accordingly, methylation at those positions is associated with the expression of genes encoding components of polycomb repressive complex 2 (PRC2) in trans. Further analysis revealed trans-associations for 1816 aVMPs with an additional 854 genes. These trans-associated aVMPs are characterized by either an age-related gain of methylation at CpG islands marked by PRC2 or a loss of methylation at enhancers. This distinct pattern extends to other tissues and multiple cancer types. Finally, genes associated with aVMPs in trans whose expression is variably upregulated with age (733 genes) play a key role in DNA repair and apoptosis, whereas downregulated aVMP-associated genes (121 genes) are mapped to defined pathways in cellular metabolism.
Our results link age-related changes in DNA methylation to fundamental mechanisms that are thought to drive human ageing.
BACKGROUND: Long noncoding RNAs (lncRNAs) form an abundant class of transcripts, but the function of the majority of them remains elusive. While it has been shown that some lncRNAs are bound by ...ribosomes, it has also been convincingly demonstrated that these transcripts do not code for proteins. To obtain a comprehensive understanding of the extent to which lncRNAs bind ribosomes, we performed systematic RNA sequencing on ribosome-associated RNA pools obtained through ribosomal fractionation and compared the RNA content with nuclear and (non-ribosome bound) cytosolic RNA pools. RESULTS: The RNA composition of the subcellular fractions differs significantly from each other, but lncRNAs are found in all locations. A subset of specific lncRNAs is enriched in the nucleus but surprisingly the majority is enriched in the cytosol and in ribosomal fractions. The ribosomal enriched lncRNAs include H19 and TUG1. CONCLUSIONS: Most studies on lncRNAs have focused on the regulatory function of these transcripts in the nucleus. We demonstrate that only a minority of all lncRNAs are nuclear enriched. Our findings suggest that many lncRNAs may have a function in cytoplasmic processes, and in particular in ribosome complexes.
Noninvasive fetal aneuploidy detection by use of free DNA from maternal plasma has recently been shown to be achievable by whole genome shotgun sequencing. The high-throughput next-generation ...sequencing platforms previously tested use a PCR step during sample preparation, which results in amplification bias in GC-rich areas of the human genome. To eliminate this bias, and thereby experimental noise, we have used single molecule sequencing as an alternative method.
For noninvasive trisomy 21 detection, we performed single molecule sequencing on the Helicos platform using free DNA isolated from maternal plasma from 9 weeks of gestation onwards. Relative sequence tag density ratios were calculated and results were directly compared to the previously described Illumina GAII platform.
Sequence data generated without an amplification step show no GC bias. Therefore, with the use of single molecule sequencing all trisomy 21 fetuses could be distinguished more clearly from euploid fetuses.
This study shows for the first time that single molecule sequencing is an attractive and easy to use alternative for reliable noninvasive fetal aneuploidy detection in diagnostics. With this approach, previously described experimental noise associated with PCR amplification, such as GC bias, can be overcome.
Immune cell function can be altered by lipids in circulation, a process potentially relevant to lipid-associated inflammatory diseases including atherosclerosis and rheumatoid arthritis. To gain ...further insight in the molecular changes involved, we here perform a transcriptome-wide association analysis of blood triglycerides, HDL cholesterol, and LDL cholesterol in 3229 individuals, followed by a systematic bidirectional Mendelian randomization analysis to assess the direction of effects and control for pleiotropy. Triglycerides are found to induce transcriptional changes in 55 genes and HDL cholesterol in 5 genes. The function and cell-specific expression pattern of these genes implies that triglycerides downregulate both cellular lipid metabolism and, unexpectedly, allergic response. Indeed, a Mendelian randomization approach based on GWAS summary statistics indicates that several of these genes, including interleukin-4 (IL4) and IgE receptors (FCER1A, MS4A2), affect the incidence of allergic diseases. Our findings highlight the interplay between triglycerides and immune cells in allergic disease.
DNA methylation is a key epigenetic modification in human development and disease, yet there is limited understanding of its highly coordinated regulation. Here, we identify 818 genes that affect DNA ...methylation patterns in blood using large-scale population genomics data.
By employing genetic instruments as causal anchors, we establish directed associations between gene expression and distant DNA methylation levels, while ensuring specificity of the associations by correcting for linkage disequilibrium and pleiotropy among neighboring genes. The identified genes are enriched for transcription factors, of which many consistently increased or decreased DNA methylation levels at multiple CpG sites. In addition, we show that a substantial number of transcription factors affected DNA methylation at their experimentally determined binding sites. We also observe genes encoding proteins with heterogenous functions that have widespread effects on DNA methylation, e.g., NFKBIE, CDCA7(L), and NLRC5, and for several examples, we suggest plausible mechanisms underlying their effect on DNA methylation.
We report hundreds of genes that affect DNA methylation and provide key insights in the principles underlying epigenetic regulation.
Filtering, FDR and power van Iterson, Maarten; Boer, Judith M; Menezes, Renée X
BMC bioinformatics,
09/2010, Letnik:
11, Številka:
1
Journal Article
Recenzirano
Odprti dostop
In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing ...penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.
We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.
Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice.
SNP panels that uniquely identify an individual are useful for genetic and forensic research. Previously recommended SNP panels are based on DNA profiles and mostly contain intragenic SNPs. With the ...increasing interest in RNA expression profiles, we aimed for establishing a SNP panel for both DNA and RNA-based genotyping.
To determine a small set of SNPs with maximally discriminative power, genotype calls were obtained from DNA and blood-derived RNA sequencing data belonging to healthy, geographically dispersed, Dutch individuals. SNPs were selected based on different criteria like genotype call rate, minor allele frequency, Hardy-Weinberg equilibrium and linkage disequilibrium. A panel of 50 SNPs was sufficient to identify an individual uniquely: the probability of identity was 6.9 × 10
when assuming no family relations and 1.2 × 10
when accounting for the presence of full sibs. The ability of the SNP panel to uniquely identify individuals on DNA and RNA level was validated in an independent population dataset. The panel is applicable to individuals from European descent, with slightly lower power in non-Europeans. Whereas most of the genes containing the 50 SNPs are expressed in various tissues, our SNP panel needs optimization for other tissues than blood.
This first DNA/RNA SNP panel will be useful to identify sample mix-ups in biomedical research and for assigning DNA and RNA stains in crime scenes to unique individuals.
Duchenne muscular dystrophy (DMD) is a severe progressive muscular disorder caused by reading frame disrupting mutations in the DMD gene, preventing the synthesis of functional dystrophin. As ...dystrophin provides muscle fiber stability during contractions, dystrophin negative fibers are prone to exercise-induced damage. Upon exhaustion of the regenerative capacity, fibers will be replaced by fibrotic and fat tissue resulting in a progressive loss of function eventually leading to death in the early thirties. With several promising approaches for the treatment of DMD aiming at dystrophin restoration in clinical trials, there is an increasing need to determine more precisely which dystrophin levels are sufficient to restore muscle fiber integrity, protect against muscle damage and improve muscle function.To address this we generated a new mouse model (mdx-Xist(Δhs)) with varying, low dystrophin levels (3-47%, mean 22.7%, stdev 12.1, n = 24) due to skewed X-inactivation. Longitudinal sections revealed that within individual fibers, some nuclei did and some did not express dystrophin, resulting in a random, mosaic pattern of dystrophin expression within fibers.Mdx-Xist(Δhs), mdx and wild type females underwent a 12 week functional test regime consisting of different tests to assess muscle function at base line, or after chronic treadmill running exercise. Overall, mdx-Xist(Δhs) mice with 3-14% dystrophin outperformed mdx mice in the functional tests. Improved histopathology was observed in mice with 15-29% dystrophin and these levels also resulted in normalized expression of pro-inflammatory biomarker genes, while for other parameters >30% of dystrophin was needed. Chronic exercise clearly worsened pathology, which needed dystrophin levels >20% for protection. Based on these findings, we conclude that while even dystrophin levels below 15% can improve pathology and performance, levels of >20% are needed to fully protect muscle fibers from exercise-induced damage.