DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA
and contain genetic variations associated with diseases and phenotypic traits
. We created high-resolution maps of DHSs from ...733 human biosamples encompassing 438 cell and tissue types and states, and integrated these to delineate and numerically index approximately 3.6 million DHSs within the human genome sequence, providing a common coordinate system for regulatory DNA. Here we show that these maps highly resolve the cis-regulatory compartment of the human genome, which encodes unexpectedly diverse cell- and tissue-selective regulatory programs at very high density. These programs can be captured comprehensively by a simple vocabulary that enables the assignment to each DHS of a regulatory barcode that encapsulates its tissue manifestations, and global annotation of protein-coding and non-coding RNA genes in a manner orthogonal to gene expression. Finally, we show that sharply resolved DHSs markedly enhance the genetic association and heritability signals of diseases and traits. Rather than being confined to a small number of distal elements or promoters, we find that genetic signals converge on congruently regulated sets of DHSs that decorate entire gene bodies. Together, our results create a universal, extensible coordinate system and vocabulary for human regulatory DNA marked by DHSs, and provide a new global perspective on the architecture of human gene regulation.
Abstract
Summary
The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method ...named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing.
Availability and implementation
eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/.
Supplementary information
Supplementary data are available at Bioinformatics online.
The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the ...functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF-binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein-DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human-chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of "perfect" genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The origins of the Bronze Age Minoan and Mycenaean cultures have puzzled archaeologists for more than a century. We have assembled genome-wide data from 19 ancient individuals, including Minoans from ...Crete, Mycenaeans from mainland Greece, and their eastern neighbours from southwestern Anatolia. Here we show that Minoans and Mycenaeans were genetically similar, having at least three-quarters of their ancestry from the first Neolithic farmers of western Anatolia and the Aegean, and most of the remainder from ancient populations related to those of the Caucasus and Iran. However, the Mycenaeans differed from Minoans in deriving additional ancestry from an ultimate source related to the hunter-gatherers of eastern Europe and Siberia, introduced via a proximal source related to the inhabitants of either the Eurasian steppe or Armenia. Modern Greeks resemble the Mycenaeans, but with some additional dilution of the Early Neolithic ancestry. Our results support the idea of continuity but not isolation in the history of populations of the Aegean, before and after the time of its earliest civilizations.
Gene-distal enhancers are critical for tissue-specific gene expression, but their genomic determinants within a specific lineage at different stages of development are unknown. Here we profile ...chromatin state maps, transcription factor occupancy, and gene expression profiles during human erythroid development at fetal and adult stages. Comparative analyses of human erythropoiesis identify developmental stage-specific enhancers as primary determinants of stage-specific gene expression programs. We find that erythroid master regulators GATA1 and TAL1 act cooperatively within active enhancers but confer little predictive value for stage specificity. Instead, a set of stage-specific coregulators collaborates with master regulators and contributes to differential gene expression. We further identify and validate IRF2, IRF6, and MYB as effectors of an adult-stage expression program. Thus, the combinatorial assembly of lineage-specific master regulators and transcriptional coregulators within developmental stage-specific enhancers determines gene expression programs and temporal regulation of transcriptional networks in a mammalian genome.
Display omitted
► Comparative genomic profiling of human fetal and adult erythropoiesis ► Developmental stage-specific enhancers determine gene expression programs ► Master regulators cooperate within enhancers but confer little stage specificity ► Coregulators collaborate with master regulators in establishing gene networks
Xu et al. investigate human fetal and adult erythroid gene expression programs by comparative transcriptome, transcription factor, and chromatin profiling. They demonstrate that lineage-specifying master regulators and chromatin modifications at regulatory enhancers are relatively static platforms; fetal/adult-stage specificity of gene expression is controlled by a distinct layer of transcriptional coregulators.
The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a ...digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs and the identification of hundreds of new binding sites for major regulators. We observed striking correspondence between single-nucleotide resolution DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting should be a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence.
Inherited mutations in the BRCA2-interacting protein PALB2 are known to be associated with increased risks of developing breast cancer. To evaluate the contribution of PALB2 to familial breast cancer ...in the United States, we sequenced the coding sequences and flanking regulatory regions of the gene from constitutional genomic DNA of 1,144 familial breast cancer patients with wild-type sequences at BRCA1 and BRCA2. Overall, 3.4% (33/972) of patients not selected by ancestry and 0% (0/172) of patients specifically of Ashkenazi Jewish ancestry were heterozygous for a nonsense, frameshift, or frameshift-associated splice mutation in PALB2. Mutations were detected in both male and female breast cancer patients. All mutations were individually rare: the 33 heterozygotes harbored 13 different mutations, 5 previously reported and 8 novel mutations. PALB2 heterozygotes were 4-fold more likely to have a male relative with breast cancer (P = 0.0003), 6-fold more likely to have a relative with pancreatic cancer (P = 0.002), and 1.3-fold more likely to have a relative with ovarian cancer (P = 0.18). Compared with their female relatives without mutations, increased risk of developing breast cancer for female PALB2 heterozygotes was 2.3-fold (95% CI: 1.5-4.2) by age 55 and 3.4-fold (95% CI: 2.4-5.9) by age 85. Loss of the wild-type PALB2 allele was observed in laser-dissected tumor specimens from heterozygous patients. Given this mutation prevalence and risk, consideration might be given to clinical testing of PALB2 by complete genomic sequencing for familial breast cancer patients with wild-type sequences at BRCA1 and BRCA2.
The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary ...constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ∼8.6 million transcription factor (TF) occupancy sites at nucleotide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ∼95% similar with that derived from human TF footprints. However, only ∼20% of mouse TF footprints have human orthologues. Despite substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architectures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity.
Genome-wide association studies (GWASs) have ascertained numerous trait-associated common genetic variants, frequently localized to regulatory DNA. We found that common genetic variation at BCL11A ...associated with fetal hemoglobin (HbF) level lies in noncoding sequences decorated by an erythroid enhancer chromatin signature. Fine-mapping uncovers a motif-disrupting common variant associated with reduced transcription factor (TF) binding, modestly diminished BCL11A expression, and elevated HbF. The surrounding sequences function in vivo as a developmental stage-specific, lineage-restricted enhancer. Genome engineering reveals the enhancer is required in erythroid but not B-lymphoid cells for BCL11A expression. These findings illustrate how GWASs may expose functional variants of modest impact within causal elements essential for appropriate gene expression. We propose the GWAS-marked BCL11A enhancer represents an attractive target for therapeutic genome engineering for the ß-hemoglobinopathies.
The NIH Roadmap Epigenomics Mapping Consortium Bernstein, Bradley E; Stamatoyannopoulos, John A; Costello, Joseph F ...
Nature biotechnology,
10/2010, Letnik:
28, Številka:
10
Journal Article
Recenzirano
Odprti dostop
The NIH Roadmap Epigenomics Mapping Consortium aims to produce a public resource of epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues ...and organ systems frequently involved in human disease.