Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin ...interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training.
Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly ...focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.
Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to ...comprehensively map higher-order chromosome folding and specific chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type-specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCF is involved in defining the interface between condensed and open compartments for structural regulation. Our 3D genome strategy thus provides unique insights in the topological mechanism of human variations and diseases.
Display omitted
•ChIA-PET is inclusive in mapping 3D genome at multi-scale and nucleotide resolution•CTCF foci spatially arrange RNAPII transcription concordant in CTCF-motif direction•SNPs alter haplotype chromatin topology and function that link to disease risks•3D genome models elucidate topological framework for transcriptional regulation
Advanced ChIA-PET shows that CTCF/cohesin and RNA polymerase II arrange spatial organization for coordinated transcription. Haplotype variants exhibit allelic effects on chromatin topology and transcription that link disease susceptibility.
Mammalian genomes are viewed as functional organizations that orchestrate spatial and temporal gene regulation. CTCF, the most characterized insulator-binding protein, has been implicated as a key ...genome organizer. However, little is known about CTCF-associated higher-order chromatin structures at a global scale. Here we applied chromatin interaction analysis by paired-end tag (ChIA-PET) sequencing to elucidate the CTCF-chromatin interactome in pluripotent cells. From this analysis, we identified 1,480 cis- and 336 trans-interacting loci with high reproducibility and precision. Associating these chromatin interaction loci with their underlying epigenetic states, promoter activities, enhancer binding and nuclear lamina occupancy, we uncovered five distinct chromatin domains that suggest potential new models of CTCF function in chromatin organization and transcriptional control. Specifically, CTCF interactions demarcate chromatin-nuclear membrane attachments and influence proper gene expression through extensive cross-talk between promoters and regulatory elements. This highly complex nuclear organization offers insights toward the unifying principles that govern genome plasticity and function.
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) is a robust method for capturing genome-wide chromatin interactions. Unlike other 3C-based methods, it includes a chromatin ...immunoprecipitation (ChIP) step that enriches for interactions mediated by specific target proteins. This unique feature allows ChIA-PET to provide the functional specificity and higher resolution needed to detect chromatin interactions, which chromosome conformation capture (3C)/Hi-C approaches have not achieved. The original ChIA-PET protocol generates short paired-end tags (2 × 20 base pairs (bp)) to detect two genomic loci that are far apart on linear chromosomes but are in spatial proximity in the folded genome. We have improved the original approach by developing long-read ChIA-PET, in which the length of the paired-end tags is increased (up to 2 × 250 bp). The longer PET reads not only improve the tag-mapping efficiency but also increase the probability of covering phased single-nucleotide polymorphisms (SNPs), which allows haplotype-specific chromatin interactions to be identified. Here, we provide the detailed protocol for long-read ChIA-PET that includes cell fixation and lysis, chromatin fragmentation by sonication, ChIP, proximity ligation with a bridge linker, Tn5 tagmentation, PCR amplification and high-throughput sequencing. For a well-trained molecular biologist, it typically takes 6 d from cell harvesting to the completion of library construction, up to a further 36 h for DNA sequencing and <20 h for processing of raw sequencing reads.
The number of reported examples of chromatin architecture alterations involved in the regulation of gene transcription and in disease is increasing. However, no genome-wide testing has been performed ...to assess the abundance of these events and their importance relative to other factors affecting genome regulation. This is particularly interesting given that a vast majority of genetic variations identified in association studies are located outside coding sequences. This study attempts to address this lack by analyzing the impact on chromatin spatial organization of genetic variants identified in individuals from 26 human populations and in genome-wide association studies.
We assess the tendency of structural variants to accumulate in spatially interacting genomic segments and design an algorithm to model chromatin conformational changes caused by structural variations. We show that differential gene transcription is closely linked to the variation in chromatin interaction networks mediated by RNA polymerase II. We also demonstrate that CTCF-mediated interactions are well conserved across populations, but enriched with disease-associated SNPs. Moreover, we find boundaries of topological domains as relatively frequent targets of duplications, which suggest that these duplications can be an important evolutionary mechanism of genome spatial organization.
This study assesses the critical impact of genetic variants on the higher-order organization of chromatin folding and provides insight into the mechanisms regulating gene transcription at the population scale, of which local arrangement of chromatin loops seems to be the most significant. It provides the first insight into the variability of the human 3D genome at the population scale.
The tandem duplicator phenotype (TDP) is a genome-wide instability configuration primarily observed in breast, ovarian, and endometrial carcinomas. Here, we stratify TDP tumors by classifying their ...tandem duplications (TDs) into three span intervals, with modal values of 11 kb, 231 kb, and 1.7 Mb, respectively. TDPs with ∼11 kb TDs feature loss of TP53 and BRCA1. TDPs with ∼231 kb and ∼1.7 Mb TDs associate with CCNE1 pathway activation and CDK12 disruptions, respectively. We demonstrate that p53 and BRCA1 conjoint abrogation drives TDP induction by generating short-span TDP mammary tumors in genetically modified mice lacking them. Lastly, we show how TDs in TDP tumors disrupt heterogeneous combinations of tumor suppressors and chromatin topologically associating domains while duplicating oncogenes and super-enhancers.
Display omitted
•Abundant and distributed tandem duplications form a distinct chromotype in cancer•Six recurrent tandem duplicator phenotypes (TDPs) are characterized by TD span size•Conjoint abrogation of BRCA1 and TP53 causes TDPs with ∼11 kb TDs•CCNE1 pathway activation and CDK12 mutations associate with ∼231 kb and ∼1.7 Mb TDs
Menghi et al. stratify tandem duplicator phenotype tumors by classifying their tandem duplications (TDs) into three span sizes associated with different pathway alterations and show how TDs disrupt tumor suppressors and chromatin topologically associating domains while duplicating oncogenes and super-enhancers.
In multicellular organisms, transcription regulation is one of the central mechanisms modelling lineage differentiation and cell-fate determination. Transcription requires dynamic chromatin ...configurations between promoters and their corresponding distal regulatory elements. It is believed that their communication occurs within large discrete foci of aggregated RNA polymerases termed transcription factories in three-dimensional nuclear space. However, the dynamic nature of chromatin connectivity has not been characterized at the genome-wide level. Here, through a chromatin interaction analysis with paired-end tagging approach using an antibody that primarily recognizes the pre-initiation complexes of RNA polymerase II, we explore the transcriptional interactomes of three mouse cells of progressive lineage commitment, including pluripotent embryonic stem cells, neural stem cells and neurosphere stem/progenitor cells. Our global chromatin connectivity maps reveal approximately 40,000 long-range interactions, suggest precise enhancer-promoter associations and delineate cell-type-specific chromatin structures. Analysis of the complex regulatory repertoire shows that there are extensive colocalizations among promoters and distal-acting enhancers. Most of the enhancers associate with promoters located beyond their nearest active genes, indicating that the linear juxtaposition is not the only guiding principle driving enhancer target selection. Although promoter-enhancer interactions exhibit high cell-type specificity, promoters involved in interactions are found to be generally common and mostly active among different cells. Chromatin connectivity networks reveal that the pivotal genes of reprogramming functions are transcribed within physical proximity to each other in embryonic stem cells, linking chromatin architecture to coordinated gene expression. Our study sets the stage for the full-scale dissection of spatial and temporal genome structures and their roles in orchestrating development.
Summary
Reclaimed water use is an important component of sustainable water resource management. However, there are concerns regarding pathogen transport through this alternative water supply. This ...study characterized the viral community found in reclaimed water and compared it with viruses in potable water. Reclaimed water contained 1000‐fold more virus‐like particles than potable water, having approximately 108 VLPs per millilitre. Metagenomic analyses revealed that most of the viruses in both reclaimed and potable water were novel. Bacteriophages dominated the DNA viral community in both reclaimed and potable water, but reclaimed water had a distinct phage community based on phage family distributions and host representation within each family. Eukaryotic viruses similar to plant pathogens and invertebrate picornaviruses dominated RNA metagenomic libraries. Established human pathogens were not detected in reclaimed water viral metagenomes, which contained a wealth of novel single‐stranded DNA and RNA viruses related to plant, animal and insect viruses. Therefore, reclaimed water may play a role in the dissemination of highly stable viruses. Information regarding viruses present in reclaimed water but not in potable water can be used to identify new bioindicators of water quality. Future studies will need to investigate the infectivity and host range of these viruses to evaluate the impacts of reclaimed water use on human and ecosystem health.