Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual ...examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.
Genome-wide proximity ligation based assays like Hi-C have opened a window to the 3D organization of the genome. In so doing, they present data structures that are different from conventional 1D ...signal tracks. To exploit the 2D nature of Hi-C contact maps, matrix techniques like spectral analysis are particularly useful. Here, we present HiC-spector, a collection of matrix-related functions for analyzing Hi-C contact maps. In particular, we introduce a novel reproducibility metric for quantifying the similarity between contact maps based on spectral decomposition. The metric successfully separates contact maps mapped from Hi-C data coming from biological replicates, pseudo-replicates and different cell types.
Source code in Julia and Python, and detailed documentation is available at https://github.com/gersteinlab/HiC-spector .
koonkiu.yan@gmail.com or mark@gersteinlab.org.
Supplementary data are available at Bioinformatics online.
Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are ...costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study.
Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments.
In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.
Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large ...numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.
During the female lifetime, the expansion of the epithelium dictated by the ovarian cycles is supported by a transient increase in the mammary epithelial stem cell population (MaSCs). Notably, ...activation of Wnt/β‐catenin signaling is an important trigger for MaSC expansion. Here, we report that the miR‐424/503 cluster is a modulator of canonical Wnt signaling in the mammary epithelium. We show that mammary tumors of miR‐424(322)/503‐depleted mice exhibit activated Wnt/β‐catenin signaling. Importantly, we show a strong association between miR‐424/503 deletion and breast cancers with high levels of Wnt/β‐catenin signaling. Moreover, miR‐424/503 cluster is required for Wnt‐mediated MaSC expansion induced by the ovarian cycles. Lastly, we show that miR‐424/503 exerts its function by targeting two binding sites at the 3'UTR of the LRP6 co‐receptor and reducing its expression. These results unveil an unknown link between the miR‐424/503, regulation of Wnt signaling, MaSC fate, and tumorigenesis.
Synopsis
The miR‐424/503 cluster modulates canonical Wnt‐signaling in the mammary epithelium by targeting the LRP6 co‐receptor, unveiling a link between miR‐424/503, regulation of WNT‐signaling, mammary epithelial stem cell fate and tumorigenesis.
Deletion of miR‐424/503 in mice generates mammary tumors with activated Wnt/β‐catenin.
Deletion of miR‐424/503 is linked to activation of canonical Wnt‐signaling in human triple negative breast cancers.
MiR‐424/503 expression influences mammary epithelial stem cell expansion dictated by ovarian cycles.
MiR‐424/503 targets the LRP6 co‐receptor by binding to conserved binding sites in the 3'‐UTR of its mRNA.
The miR‐424/503 cluster modulates canonical Wnt‐signaling in the mammary epithelium by targeting the LRP6 co‐receptor, unveiling a link between miR‐424/503, regulation of WNT‐signaling, mammary epithelial stem cell fate and tumorigenesis.
Many signaling and other genes known as "hidden" drivers may not be genetically or epigenetically altered or differentially expressed at the mRNA or protein levels, but, rather, drive a phenotype ...such as tumorigenesis via post-translational modification or other mechanisms. However, conventional approaches based on genomics or differential expression are limited in exposing such hidden drivers. Here, we present a comprehensive algorithm and toolkit NetBID2 (data-driven network-based Bayesian inference of drivers, version 2), which reverse-engineers context-specific interactomes and integrates network activity inferred from large-scale multi-omics data, empowering the identification of hidden drivers that could not be detected by traditional analyses. NetBID2 has substantially re-engineered the previous prototype version by providing versatile data visualization and sophisticated statistical analyses, which strongly facilitate researchers for result interpretation through end-to-end multi-omics data analysis. We demonstrate the power of NetBID2 using three hidden driver examples. We deploy NetBID2 Viewer, Runner, and Cloud apps with 145 context-specific gene regulatory and signaling networks across normal tissues and paediatric and adult cancers to facilitate end-to-end analysis, real-time interactive visualization and cloud-based data sharing. NetBID2 is freely available at https://jyyulab.github.io/NetBID .
Genome-wide ligation-based assays such as Hi-C provide us with an unprecedented opportunity to investigate the spatial organization of the genome. Results of a typical Hi-C experiment are often ...summarized in a chromosomal contact map, a matrix whose elements reflect the co-location frequencies of genomic loci. To elucidate the complex structural and functional interactions between those genomic loci, networks offer a natural and powerful framework.
We propose a novel graph-theoretical framework, the Corrected Gene Proximity (CGP) map to study the effect of the 3D spatial organization of genes in transcriptional regulation. The starting point of the CGP map is a weighted network, the gene proximity map, whose weights are based on the contact frequencies between genes extracted from genome-wide Hi-C data. We derive a null model for the network based on the signal contributed by the 1D genomic distance and use it to "correct" the gene proximity for cell type 3D specific arrangements. The CGP map, therefore, provides a network framework for the 3D structure of the genome on a global scale. On human cell lines, we show that the CGP map can detect and quantify gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies. Analyzing the expression pattern of metabolic pathways of two hematopoietic cell lines, we find that the relative positioning of the genes, as captured and quantified by the CGP, is highly correlated with their expression change. We further show that the CGP map can be used to form an inter-chromosomal proximity map that allows large-scale abnormalities, such as chromosomal translocations, to be identified.
The Corrected Gene Proximity map is a map of the 3D structure of the genome on a global scale. It allows the simultaneous analysis of intra- and inter- chromosomal interactions and of gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies, thus revealing hidden associations between global spatial positioning and gene expression. The flexible graph-based formalism of the CGP map can be easily generalized to study any existing Hi-C datasets.
Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a ...simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail and investigate the correlation between its hierarchy and kinase properties. We also compare it to the regulatory network, finding that the phosphorylome is more hierarchical than the regulome.
T-cell acute lymphoblastic leukemia (T-ALL) is a highly malignant pediatric leukemia, where few therapeutic options are available for patients which relapse. We find that therapeutic targeting of GLI ...transcription factors by GANT-61 is particularly effective against NOTCH1 unmutated T-ALL cells. Investigation of the functional role of GLI1 disclosed that it contributes to T-ALL cell proliferation, survival, and dissemination through the modulation of AKT and CXCR4 signaling pathways. Decreased CXCR4 signaling following GLI1 inactivation was found to be prevalently due to post-transcriptional mechanisms including altered serine 339 CXCR4 phosphorylation and cortactin levels. We also identify a novel cross-talk between GLI transcription factors and FOXC1. Indeed, GLI factors can activate the expression of FOXC1 which is able to stabilize GLI1/2 protein levels through attenuation of their ubiquitination. Further, we find that prolonged GLI1 deficiency has a double-edged role in T-ALL progression favoring disease dissemination through the activation of a putative AKT/FOXC1/GLI2 axis. These findings have clinical significance as T-ALL patients with extensive central nervous system dissemination show low GLI1 transcript levels. Further, T-ALL patients having a GLI2-based Hedgehog activation signature are associated with poor survival. Together, these findings support a rationale for targeting the FOXC1/AKT axis to prevent GLI-dependent oncogenic Hedgehog signaling.
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human ...biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.