Gene expression profiling has uncovered the transcription factor Sox4 with upregulated activity during TGF-β-induced epithelial-mesenchymal transition (EMT) in normal and cancerous breast epithelial ...cells. Sox4 is indispensable for EMT and cell survival in vitro and for primary tumor growth and metastasis in vivo. Among several EMT-relevant genes, Sox4 directly regulates the expression of Ezh2, encoding the Polycomb group histone methyltransferase that trimethylates histone 3 lysine 27 (H3K27me3) for gene repression. Ablation of Ezh2 expression prevents EMT, whereas forced expression of Ezh2 restores EMT in Sox4-deficient cells. Ezh2-mediated H3K27me3 marks associate with key EMT genes, representing an epigenetic EMT signature that predicts patient survival. Our results identify Sox4 as a master regulator of EMT by governing the expression of the epigenetic modifier Ezh2.
Display omitted
•Sox4 is critical for EMT and for experimental primary tumor growth and metastasis•Sox4 directly regulates EMT-relevant genes, among them Ezh2•Ezh2 function and thus H3K27me3 are required for EMT•The expression of Ezh2-regulated genes is predictive for patient survival
Although recombination is accepted to be common in bacteria, for many species robust phylogenies with well-resolved branches can be reconstructed from whole genome alignments of strains, and these ...are generally interpreted to reflect clonal relationships. Using new methods based on the statistics of single-nucleotide polymorphism (SNP) splits, we show that this interpretation is incorrect. For many species, each locus has recombined many times along its line of descent, and instead of many loci supporting a common phylogeny, the phylogeny changes many thousands of times along the genome alignment. Analysis of the patterns of allele sharing among strains shows that bacterial populations cannot be approximated as either clonal or freely recombining but are structured such that recombination rates between lineages vary over several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect distributions of recombination rates.
Methylation of cytosines is an essential epigenetic modification in mammalian genomes, yet the rules that govern methylation patterns remain largely elusive. To gain insights into this process, we ...generated base-pair-resolution mouse methylomes in stem cells and neuronal progenitors. Advanced quantitative analysis identified low-methylated regions (LMRs) with an average methylation of 30%. These represent CpG-poor distal regulatory regions as evidenced by location, DNase I hypersensitivity, presence of enhancer chromatin marks and enhancer activity in reporter assays. LMRs are occupied by DNA-binding factors and their binding is necessary and sufficient to create LMRs. A comparison of neuronal and stem-cell methylomes confirms this dependency, as cell-type-specific LMRs are occupied by cell-type-specific transcription factors. This study provides methylome references for the mouse and shows that DNA-binding factors locally influence DNA methylation, enabling the identification of active regulatory regions.
Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., ...statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.
...we next aimed to prove that the max-ent approach was "correct" for this system, i.e., that the "true" distribution of the population in genotype space was well approximated by the max-ent ...distribution. ...I want to close by mentioning one "elephant in the room" that has so far not been discussed.
Although it is well appreciated that gene expression is inherently noisy and that transcriptional noise is encoded in a promoter's sequence, little is known about the extent to which noise levels of ...individual promoters vary across growth conditions. Using flow cytometry, we here quantify transcriptional noise in Escherichia coli genome-wide across 8 growth conditions and find that noise levels systematically decrease with growth rate, with a condition-dependent lower bound on noise. Whereas constitutive promoters consistently exhibit low noise in all conditions, regulated promoters are both more noisy on average and more variable in noise across conditions. Moreover, individual promoters show highly distinct variation in noise across conditions. We show that a simple model of noise propagation from regulators to their targets can explain a significant fraction of the variation in relative noise levels and identifies TFs that most contribute to both condition-specific and condition-independent noise propagation. In addition, analysis of the genome-wide correlation structure of various gene properties shows that gene regulation, expression noise, and noise plasticity are all positively correlated genome-wide and vary independently of variations in absolute expression, codon bias, and evolutionary rate. Together, our results show that while absolute expression noise tends to decrease with growth rate, relative noise levels of genes are highly condition-dependent and determined by the propagation of noise through the gene regulatory network.
Despite substantial progress in single-cell RNA-seq (scRNA-seq) data analysis methods, there is still little agreement on how to best normalize such data. Starting from the basic requirements that ...inferred expression states should correct for both biological and measurement sampling noise and that changes in expression should be measured in terms of fold changes, we here derive a Bayesian normalization procedure called Sanity (SAmpling-Noise-corrected Inference of Transcription activitY) from first principles. Sanity estimates expression values and associated error bars directly from raw unique molecular identifier (UMI) counts without any tunable parameters. Using simulated and real scRNA-seq datasets, we show that Sanity outperforms other normalization methods on downstream tasks, such as finding nearest-neighbor cells and clustering cells into subtypes. Moreover, we show that by systematically overestimating the expression variability of genes with low expression and by introducing spurious correlations through mapping the data to a lower-dimensional representation, other methods yield severely distorted pictures of the data.
Living cells proliferate by completing and coordinating two cycles, a division cycle controlling cell size and a DNA replication cycle controlling the number of chromosomal copies. It remains unclear ...how bacteria such as
tightly coordinate those two cycles across a wide range of growth conditions. Here, we used time-lapse microscopy in combination with microfluidics to measure growth, division and replication in single
cells in both slow and fast growth conditions. To compare different phenomenological cell cycle models, we introduce a statistical framework assessing their ability to capture the correlation structure observed in the data. In combination with stochastic simulations, our data indicate that the cell cycle is driven from one initiation event to the next rather than from birth to division and is controlled by two adder mechanisms: the added volume since the last initiation event determines the timing of both the next division and replication initiation events.
The cellular changes during an epithelial-mesenchymal transition (EMT) largely rely on global changes in gene expression orchestrated by transcription factors. Tead transcription factors and their ...transcriptional co-activators Yap and Taz have been previously implicated in promoting an EMT; however, their direct transcriptional target genes and their functional role during EMT have remained elusive. We have uncovered a previously unanticipated role of the transcription factor Tead2 during EMT. During EMT in mammary gland epithelial cells and breast cancer cells, levels of Tead2 increase in the nucleus of cells, thereby directing a predominant nuclear localization of its co-factors Yap and Taz via the formation of Tead2-Yap-Taz complexes. Genome-wide chromatin immunoprecipitation and next generation sequencing in combination with gene expression profiling revealed the transcriptional targets of Tead2 during EMT. Among these, zyxin contributes to the migratory and invasive phenotype evoked by Tead2. The results demonstrate that Tead transcription factors are crucial regulators of the cellular distribution of Yap and Taz, and together they control the expression of genes critical for EMT and metastasis.
In obesity, white adipose tissue (WAT) inflammation is linked to insulin resistance. Increased adipocyte chemokine (C-C motif) ligand 2 (CCL2) secretion may initiate adipose inflammation by ...attracting the migration of inflammatory cells into the tissue. Using an unbiased approach, we identified adipose microRNAs (miRNAs) that are dysregulated in human obesity and assessed their possible role in controlling CCL2 production. In subcutaneous WAT obtained from 56 subjects, 11 miRNAs were present in all subjects and downregulated in obesity. Of these, 10 affected adipocyte CCL2 secretion in vitro and for 2 miRNAs (miR-126 and miR-193b), regulatory circuits were defined. While miR-126 bound directly to the 3'-untranslated region of CCL2 mRNA, miR-193b regulated CCL2 production indirectly through a network of transcription factors, many of which have been identified in other inflammatory conditions. In addition, overexpression of miR-193b and miR-126 in a human monocyte/macrophage cell line attenuated CCL2 production. The levels of the two miRNAs in subcutaneous WAT were significantly associated with CCL2 secretion (miR-193b) and expression of integrin, α-X, an inflammatory macrophage marker (miR-193b and miR-126). Taken together, our data suggest that miRNAs may be important regulators of adipose inflammation through their effects on CCL2 release from human adipocytes and macrophages.