De novo prediction of human chromosome structures Di Pierro, Michele; Cheng, Ryan R.; Aiden, Erez Lieberman ...
Proceedings of the National Academy of Sciences - PNAS,
11/2017, Letnik:
114, Številka:
46
Journal Article
Recenzirano
Odprti dostop
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data ...derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization Minimal Chromatin Model (MiChroM) to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.
Internal friction, which reflects the “roughness” of the energy landscape, plays an important role for proteins by modulating the dynamics of their folding and other conformational changes. However, ...the experimental quantification of internal friction and its contribution to folding dynamics has remained challenging. Here we use the combination of single-molecule Förster resonance energy transfer, nanosecond fluorescence correlation spectroscopy, and microfluidic mixing to determine the reconfiguration times of unfolded proteins and investigate the mechanisms of internal friction contributing to their dynamics. Using concepts from polymer dynamics, we determine internal friction with three complementary, largely independent, and consistent approaches as an additive contribution to the reconfiguration time of the unfolded state. We find that the magnitude of internal friction correlates with the compactness of the unfolded protein: its contribution dominates the reconfiguration time of approximately 100 ns of the compact unfolded state of a small cold shock protein under native conditions, but decreases for more expanded chains, and approaches zero both at high denaturant concentrations and in intrinsically disordered proteins that are expanded due to intramolecular charge repulsion. Our results suggest that internal friction in the unfolded state will be particularly relevant for the kinetics of proteins that fold in the microsecond range or faster. The low internal friction in expanded intrinsically disordered proteins may have implications for the dynamics of their interactions with cellular binding partners.
A challenge in molecular biology is to distinguish the key subset of residues that allow two-component signaling (TCS) proteins to recognize their correct signaling partner such that they can ...transiently bind and transfer signal, i.e., phosphoryl group. Detailed knowledge of this information would allow one to search sequence space for mutations that can be used to systematically tune the signal transmission between TCS partners as well as potentially encode a TCS protein to preferentially transfer signals to a nonpartner. Motivated by the notion that this detailed information is found in sequence data, we explore the sequence coevolution between signaling partners to better understand how mutations can positively or negatively alter their ability to transfer signal. Using direct coupling analysis for determining evolutionarily conserved protein-protein interactions, we apply a metric called the direct information score to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in measured in vitro phosphotransfer. Furthermore, by subtracting from our metric an appropriate null model corresponding to generic, conserved features in TCS signaling pairs, we can isolate the determinants that give rise to interaction specificity and recognition, which are variable among different TCS partners. Our methodology forms a potential framework for the rational design of TCS systems by allowing one to quickly search sequence space for mutations or even entirely new sequences that can increase or decrease our metric, as a proxy for increasing or decreasing phosphotransfer ability between TCS proteins.
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in ...microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T ₛₑₗ at which these foldable sequences have been selected in sequence space by evolution. T ₛₑₗ quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T ₛₑₗ are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.
Protein–protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new ...therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein–protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
In recent years, much effort has been devoted to understanding the three-dimensional (3D) organization of the genome and how genomic structure mediates nuclear function. The development of ...experimental techniques that combine DNA proximity ligation with high-throughput sequencing, such as Hi-C, have substantially improved our knowledge about chromatin organization. Numerous experimental advancements, not only utilizing DNA proximity ligation but also high-resolution genome imaging (DNA tracing), have required theoretical modeling to determine the structural ensembles consistent with such data. These 3D polymer models of the genome provide an understanding of the physical mechanisms governing genome architecture. Here, we present an overview of the recent advances in modeling the ensemble of 3D chromosomal structures by employing the maximum entropy approach combined with polymer physics. Particularly, we discuss the minimal chromatin model (MiChroM) along with the “maximum entropy genomic annotations from biomarkers associated with structural ensembles” (MEGABASE) model, which have been remarkably successful in the accurate modeling of chromosomes consistent with both Hi-C and DNA-tracing data.
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical ...DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Abstract
We introduce the Nucleome Data Bank (NDB), a web-based platform to simulate and analyze the three-dimensional (3D) organization of genomes. The NDB enables physics-based simulation of ...chromosomal structural dynamics through the MEGABASE + MiChroM computational pipeline. The input of the pipeline consists of epigenetic information sourced from the Encode database; the output consists of the trajectories of chromosomal motions that accurately predict Hi-C and fluorescence insitu hybridization data, as well as multiple observations of chromosomal dynamics in vivo. As an intermediate step, users can also generate chromosomal sub-compartment annotations directly from the same epigenetic input, without the use of any DNA–DNA proximity ligation data. Additionally, the NDB freely hosts both experimental and computational structural genomics data. Besides being able to perform their own genome simulations and download the hosted data, users can also analyze and visualize the same data through custom-designed web-based tools. In particular, the one-dimensional genetic and epigenetic data can be overlaid onto accurate 3D structures of chromosomes, to study the spatial distribution of genetic and epigenetic features. The NDB aims to be a shared resource to biologists, biophysicists and all genome scientists. The NDB is available at https://ndb.rice.edu.
Using computer simulations, we generate cell-specific 3D chromosomal structures and compare them to recently published chromatin structures obtained through microscopy. We demonstrate using machine ...learning and polymer physics simulations that epigenetic information can be used to predict the structural ensembles of multiple human cell lines. Theory predicts that chromosome structures are fluid and can only be described by an ensemble, which is consistent with the observation that chromosomes exhibit no unique fold. Nevertheless, our analysis of both structures from simulation and microscopy reveals that short segments of chromatin make two-state transitions between closed conformations and open dumbbell conformations. Finally, we study the conformational changes associated with the switching of genomic compartments observed in human cell lines. The formation of genomic compartments resembles hydrophobic collapse in protein folding, with the aggregation of denser and predominantly inactive chromatin driving the positioning of active chromatin toward the surface of individual chromosomal territories.