An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A ...precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control.
Nuclear DNA wraps around core histones to form nucleosomes, which restricts the binding of transcription factors to gene regulatory sequences. Pioneer transcription factors can bind DNA sites on ...nucleosomes and initiate gene regulatory events, often leading to the local opening of chromatin. However, the nucleosomal configuration of open chromatin and the basis for its regulation is unclear. We combined low and high levels of micrococcal nuclease (MNase) digestion along with core histone mapping to assess the nucleosomal configuration at enhancers and promoters in mouse liver. We find that MNase-accessible nucleosomes, bound by transcription factors, are retained more at liver-specific enhancers than at promoters and ubiquitous enhancers. The pioneer factor FoxA displaces linker histone H1, thereby keeping enhancer nucleosomes accessible in chromatin and allowing other liver-specific transcription factors to bind and stimulate transcription. Thus, nucleosomes are not exclusively repressive to gene regulation when they are retained with, and exposed by, pioneer factors.
Display omitted
•Liver-specific enhancers retain accessible nucleosomes more than ubiquitous enhancers•FoxA binding displaces linker histone and keeps nucleosomes accessible•FoxA2 is enriched near the dyad axis of accessible nucleosomes with other liver TFs•FoxA-bound nucleosomes at enhancers stimulate liver gene activation
Using low- and high-MNase sequencing with core histone mapping, Iwafuchi-Doi et al. reveal that tissue-specific enhancers retain accessible nucleosomes more than promoters and ubiquitous enhancers in mammalian chromatin. The pioneer factor FoxA displaces linker histone, thereby keeping nucleosomes accessible and allowing other liver-specific transcription factors to bind and stimulate gene activation.
The genome-wide architecture of chromatin-associated proteins that maintains chromosome integrity and gene regulation is not well defined. Here we use chromatin immunoprecipitation, exonuclease ...digestion and DNA sequencing (ChIP-exo/seq)
to define this architecture in Saccharomyces cerevisiae. We identify 21 meta-assemblages consisting of roughly 400 different proteins that are related to DNA replication, centromeres, subtelomeres, transposons and transcription by RNA polymerase (Pol) I, II and III. Replication proteins engulf a nucleosome, centromeres lack a nucleosome, and repressive proteins encompass three nucleosomes at subtelomeric X-elements. We find that most promoters associated with Pol II evolved to lack a regulatory region, having only a core promoter. These constitutive promoters comprise a short nucleosome-free region (NFR) adjacent to a +1 nucleosome, which together bind the transcription-initiation factor TFIID to form a preinitiation complex. Positioned insulators protect core promoters from upstream events. A small fraction of promoters evolved an architecture for inducibility, whereby sequence-specific transcription factors (ssTFs) create a nucleosome-depleted region (NDR) that is distinct from an NFR. We describe structural interactions among ssTFs, their cognate cofactors and the genome. These interactions include the nucleosomal and transcriptional regulators RPD3-L, SAGA, NuA4, Tup1, Mediator and SWI-SNF. Surprisingly, we do not detect interactions between ssTFs and TFIID, suggesting that such interactions do not stably occur. Our model for gene induction involves ssTFs, cofactors and general factors such as TBP and TFIIB, but not TFIID. By contrast, constitutive transcription involves TFIID but not ssTFs engaged with their cofactors. From this, we define a highly integrated network of gene regulation by ssTFs.
Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze ...regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.
Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor's DNA binding preference and cell type-specific chromatin environments. The ...chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes.
Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences.
Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.
Efficient transcriptional programming promises to open new frontiers in regenerative medicine. However, mechanisms by which programming factors transform cell fate are unknown, preventing more ...rational selection of factors to generate desirable cell types. Three transcription factors, Ngn2, Isl1 and Lhx3, were sufficient to program rapidly and efficiently spinal motor neuron identity when expressed in differentiating mouse embryonic stem cells. Replacement of Lhx3 by Phox2a led to specification of cranial, rather than spinal, motor neurons. Chromatin immunoprecipitation-sequencing analysis of Isl1, Lhx3 and Phox2a binding sites revealed that the two cell fates were programmed by the recruitment of Isl1-Lhx3 and Isl1-Phox2a complexes to distinct genomic locations characterized by a unique grammar of homeodomain binding motifs. Our findings suggest that synergistic interactions among transcription factors determine the specificity of their recruitment to cell type-specific binding sites and illustrate how a single transcription factor can be repurposed to program different cell types.
Differentiation from asexual blood stages to mature sexual gametocytes is required for the transmission of malaria parasites. Here, we report that the ApiAP2 transcription factor, PfAP2‐G2 ...(PF3D7_1408200) plays a critical role in the maturation of Plasmodium falciparum gametocytes. PfAP2‐G2 binds to the promoters of a wide array of genes that are expressed at many stages of the parasite life cycle. Interestingly, we also find binding of PfAP2‐G2 within the gene body of almost 3,000 genes, which strongly correlates with the location of H3K36me3 and several other histone modifications as well as Heterochromatin Protein 1 (HP1), suggesting that occupancy of PfAP2‐G2 in gene bodies may serve as an alternative regulatory mechanism. Disruption of pfap2‐g2 does not impact asexual development, but the majority of sexual parasites are unable to mature beyond stage III gametocytes. The absence of pfap2‐g2 leads to overexpression of 28% of the genes bound by PfAP2‐G2 and none of the PfAP2‐G2 bound genes are downregulated, suggesting that it is a repressor. We also find that PfAP2‐G2 interacts with chromatin remodeling proteins, a microrchidia (MORC) protein, and another ApiAP2 protein (PF3D7_1139300). Overall our data demonstrate that PfAP2‐G2 establishes an essential gametocyte maturation program in association with other chromatin‐related proteins.
Development of sexual stage malaria parasites is critical for transmission between humans via the mosquito host. In Plasmodium falciparum, regulation of this 10–12 day maturation process into gametocytes is poorly understood. We report that the PfAP2‐G2 transcriptional regulator is critical for sexual development beyond Stage III. The activity of PfAP2‐G2 is established early in asexual parasites through the regulation of hundreds of genes and widespread genome‐wide interactions in gene bodies involving complex formation with chromatin‐associated factors.
Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are ...proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.
Direct cell programming via overexpression of transcription factors (TFs) aims to control cell fate with the degree of precision needed for clinical applications. However, the regulatory steps ...involved in successful terminal cell fate programming remain obscure. We have investigated the underlying mechanisms by looking at gene expression, chromatin states, and TF binding during the uniquely efficient Ngn2, Isl1, and Lhx3 motor neuron programming pathway. Our analysis reveals a highly dynamic process in which Ngn2 and the Isl1/Lhx3 pair initially engage distinct regulatory regions. Subsequently, Isl1/Lhx3 binding shifts from one set of targets to another, controlling regulatory region activity and gene expression as cell differentiation progresses. Binding of Isl1/Lhx3 to later motor neuron enhancers depends on the Ebf and Onecut TFs, which are induced by Ngn2 during the programming process. Thus, motor neuron programming is the product of two initially independent transcriptional modules that converge with a feedforward transcriptional logic.
Display omitted
•ESC expression of Ngn2/Isl1/Lhx3 induces rapid transcriptional and chromatin changes•At early stages, Isl1/Lhx3 (homeodomain) and Ngn2 (bHLH) target distinct genomic sites•As programming progresses, Isl1/Lhx3 binding shows dynamic relocalization•Ngn2-induced factors guide Isl1/Lhx3 redistribution to initially inaccessible sites
Mazzoni and colleagues show that transcription factor-directed programming of ESCs to motor neurons involves two distinct regulatory modules that converge when programming TFs are relocated by the activity of factors induced in the earlier stage of the process.
Comparisons of Hi-C data sets between cell types and conditions have revealed differences in topologically associated domains (TADs) and A/B compartmentalization, which are correlated with ...differences in gene regulation. However, previous comparisons have focused on known forms of 3D organization while potentially neglecting other functionally relevant differences. We aimed to create a method to quantify all locus-specific differences between two Hi-C data sets.
We developed MultiMDS to jointly infer and align 3D chromosomal structures from two Hi-C data sets, thereby enabling a new way to comprehensively quantify relocalization of genomic loci between cell types. We demonstrate this approach by comparing Hi-C data across a variety of cell types. We consistently find relocalization of loci with minimal difference in A/B compartment score. For example, we identify compartment-independent relocalizations between GM12878 and K562 cells that involve loci displaying enhancer-associated histone marks in one cell type and polycomb-associated histone marks in the other.
MultiMDS is the first tool to identify all loci that relocalize between two Hi-C data sets. Our method can identify 3D localization differences that are correlated with cell-type-specific regulatory activities and which cannot be identified using other methods.