Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse ...phenotypic traits
, but it remains challenging to distinguish variants that affect regulatory function
. Genomic DNase I footprinting enables the quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin
. However, only a small fraction of such sites have been precisely resolved on the human genome sequence
. Here, to enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate about 4.5 million compact genomic elements that encode transcription factor occupancy at nucleotide resolution. We map the fine-scale structure within about 1.6 million DNase I-hypersensitive sites and show that the overwhelming majority are populated by well-spaced sites of single transcription factor-DNA interaction. Cell-context-dependent cis-regulation is chiefly executed by wholesale modulation of accessibility at regulatory DNA rather than by differential transcription factor occupancy within accessible elements. We also show that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions
is almost entirely attributable to variants within footprints, and that functional variants that affect transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find increased density of human genetic variation within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation.
DNase I hypersensitive sites (DHSs) are generic markers of regulatory DNA
and contain genetic variations associated with diseases and phenotypic traits
. We created high-resolution maps of DHSs from ...733 human biosamples encompassing 438 cell and tissue types and states, and integrated these to delineate and numerically index approximately 3.6 million DHSs within the human genome sequence, providing a common coordinate system for regulatory DNA. Here we show that these maps highly resolve the cis-regulatory compartment of the human genome, which encodes unexpectedly diverse cell- and tissue-selective regulatory programs at very high density. These programs can be captured comprehensively by a simple vocabulary that enables the assignment to each DHS of a regulatory barcode that encapsulates its tissue manifestations, and global annotation of protein-coding and non-coding RNA genes in a manner orthogonal to gene expression. Finally, we show that sharply resolved DHSs markedly enhance the genetic association and heritability signals of diseases and traits. Rather than being confined to a small number of distal elements or promoters, we find that genetic signals converge on congruently regulated sets of DHSs that decorate entire gene bodies. Together, our results create a universal, extensible coordinate system and vocabulary for human regulatory DNA marked by DHSs, and provide a new global perspective on the architecture of human gene regulation.
Cancer is a disease potentiated by mutations in somatic cells. Cancer mutations are not distributed uniformly along the human genome. Instead, different human genomic regions vary by up to fivefold ...in the local density of cancer somatic mutations, posing a fundamental problem for statistical methods used in cancer genomics. Epigenomic organization has been proposed as a major determinant of the cancer mutational landscape. However, both somatic mutagenesis and epigenomic features are highly cell-type-specific. We investigated the distribution of mutations in multiple independent samples of diverse cancer types and compared them to cell-type-specific epigenomic features. Here we show that chromatin accessibility and modification, together with replication timing, explain up to 86% of the variance in mutation rates along cancer genomes. The best predictors of local somatic mutation density are epigenomic features derived from the most likely cell type of origin of the corresponding malignancy. Moreover, we find that cell-of-origin chromatin features are much stronger determinants of cancer mutation profiles than chromatin features of matched cancer cell lines. Furthermore, we show that the cell type of origin of a cancer can be accurately determined based on the distribution of mutations along its genome. Thus, the DNA sequence of a cancer genome encompasses a wealth of information about the identity and epigenomic features of its cell of origin.
The large and growing number of genome-wide datasets highlights the need for high-performance feature analysis and data comparison methods, in addition to efficient data storage and retrieval ...techniques. We introduce BEDOPS, a software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives.
http://code.google.com/p/bedops/ includes binaries, source and documentation.
Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by ...deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure—related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell ...and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.
Ecosystem restoration may require returning threatened populations of ecologically pivotal species to near their former abundances, but it is often difficult to estimate historic population size of ...species that have been heavily exploited. Eastern Pacific gray whales play a key ecological role in their Arctic feeding grounds and are widely thought to have returned to their prewhaling abundance. Recent mortality spikes might signal that the population has reached long-term carrying capacity, but an alternative is that this decline was due to shifting climatic conditions on Arctic feeding grounds. We used a genetic approach to estimate prewhaling abundance of gray whales and report DNA variability at 10 loci that is typical of a population of almost equal to76,000-118,000 individuals, approximately three to five times more numerous than today's average census size of 22,000. Coalescent simulations indicate these estimates may include the entire Pacific metapopulation, suggesting that our average measurement of almost equal to96,000 individuals was probably distributed between the eastern and currently endangered western Pacific populations. These levels of genetic variation suggest the eastern population is at most at 28-56% of its historical abundance and should be considered depleted. If used to inform management, this would halve acceptable human-caused mortality for this population from 417 to 208 per year. Potentially profound ecosystem impacts may have resulted from a decline from 96,000 gray whales to the current population. At previous levels, gray whales may have seasonally resuspended 700 million cubic meters of sediment, as much as 12 Yukon Rivers, and provided food to a million sea birds.
Transcriptional dysregulation drives cancer formation but the underlying mechanisms are still poorly understood. Renal cell carcinoma (RCC) is the most common malignant kidney tumor which canonically ...activates the hypoxia-inducible transcription factor (HIF) pathway. Despite intensive study, novel therapeutic strategies to target RCC have been difficult to develop. Since the RCC epigenome is relatively understudied, we sought to elucidate key mechanisms underpinning the tumor phenotype and its clinical behavior.
We performed genome-wide chromatin accessibility (DNase-seq) and transcriptome profiling (RNA-seq) on paired tumor/normal samples from 3 patients undergoing nephrectomy for removal of RCC. We incorporated publicly available data on HIF binding (ChIP-seq) in a RCC cell line. We performed integrated analyses of these high-resolution, genome-scale datasets together with larger transcriptomic data available through The Cancer Genome Atlas (TCGA).
Though HIF transcription factors play a cardinal role in RCC oncogenesis, we found that numerous transcription factors with a RCC-selective expression pattern also demonstrated evidence of HIF binding near their gene body. Examination of chromatin accessibility profiles revealed that some of these transcription factors influenced the tumor's regulatory landscape, notably the stem cell transcription factor POU5F1 (OCT4). Elevated POU5F1 transcript levels were correlated with advanced tumor stage and poorer overall survival in RCC patients. Unexpectedly, we discovered a HIF-pathway-responsive promoter embedded within a endogenous retroviral long terminal repeat (LTR) element at the transcriptional start site of the PSOR1C3 long non-coding RNA gene upstream of POU5F1. RNA transcripts are induced from this promoter and read through PSOR1C3 into POU5F1 producing a novel POU5F1 transcript isoform. Rather than being unique to the POU5F1 locus, we found that HIF binds to several other transcriptionally active LTR elements genome-wide correlating with broad gene expression changes in RCC.
Integrated transcriptomic and epigenomic analysis of matched tumor and normal tissues from even a small number of primary patient samples revealed remarkably convergent shared regulatory landscapes. Several transcription factors appear to act downstream of HIF including the potent stem cell transcription factor POU5F1. Dysregulated expression of POU5F1 is part of a larger pattern of gene expression changes in RCC that may be induced by HIF-dependent reactivation of dormant promoters embedded within endogenous retroviral LTRs.
Hematopoietic protein-1 (Hem-1) is a hematopoietic cell specific member of the WAVE (Wiskott-Aldrich syndrome verprolin-homologous protein) complex, which regulates filamentous actin (F-actin) ...polymerization in many cell types including immune cells. However, the roles of Hem-1 and the WAVE complex in erythrocyte biology are not known. In this study, we utilized mice lacking Hem-1 expression due to a non-coding point mutation in the Hem1 gene to show that absence of Hem-1 results in microcytic, hypochromic anemia characterized by abnormally shaped erythrocytes with aberrant F-actin foci and decreased lifespan. We find that Hem-1 and members of the associated WAVE complex are normally expressed in wildtype erythrocyte progenitors and mature erythrocytes. Using mass spectrometry and global proteomics, Coomassie staining, and immunoblotting, we find that the absence of Hem-1 results in decreased representation of essential erythrocyte membrane skeletal proteins including α- and β- spectrin, dematin, p55, adducin, ankyrin, tropomodulin 1, band 3, and band 4.1. Hem1⁻/⁻ erythrocytes exhibit increased protein kinase C-dependent phosphorylation of adducin at Ser724, which targets adducin family members for dissociation from spectrin and actin, and subsequent proteolysis. Increased adducin Ser724 phosphorylation in Hem1⁻/⁻ erythrocytes correlates with decreased protein expression of the regulatory subunit of protein phosphatase 2A (PP2A), which is required for PP2A-dependent dephosphorylation of PKC targets. These results reveal a novel, critical role for Hem-1 in the homeostasis of structural proteins required for formation and stability of the actin membrane skeleton in erythrocytes.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers ...and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.