Abstract
Since its first release over a decade ago, the MetaboAnalyst web-based platform has become widely used for comprehensive metabolomics data analysis and interpretation. Here we introduce ...MetaboAnalyst version 5.0, aiming to narrow the gap from raw data to functional insights for global metabolomics based on high-resolution mass spectrometry (HRMS). Three modules have been developed to help achieve this goal, including: (i) a LC–MS Spectra Processing module which offers an easy-to-use pipeline that can perform automated parameter optimization and resumable analysis to significantly lower the barriers to LC-MS1 spectra processing; (ii) a Functional Analysis module which expands the previous MS Peaks to Pathways module to allow users to intuitively select any peak groups of interest and evaluate their enrichment of potential functions as defined by metabolic pathways and metabolite sets; (iii) a Functional Meta-Analysis module to combine multiple global metabolomics datasets obtained under complementary conditions or from similar studies to arrive at comprehensive functional insights. There are many other new functions including weighted joint-pathway analysis, data-driven network analysis, batch effect correction, merging technical replicates, improved compound name matching, etc. The web interface, graphics and underlying codebase have also been refactored to improve performance and user experience. At the end of an analysis session, users can now easily switch to other compatible modules for a more streamlined data analysis. MetaboAnalyst 5.0 is freely available at https://www.metaboanalyst.ca.
Graphical Abstract
Graphical Abstract
From raw data to statistical and functional insights using MetaboAnalyst 5.0.
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely ...uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Human endogenous retrovirus subfamily H (HERVH) is a class of transposable elements expressed preferentially in human embryonic stem cells (hESCs). Here, we report that the long terminal repeats of ...HERVH function as enhancers and that HERVH is a nuclear long noncoding RNA required to maintain hESC identity. Furthermore, HERVH is associated with OCT4, coactivators and Mediator subunits. Together, these results uncover a new role of species-specific transposable elements in hESCs.
Transposon-insertion sequencing (TIS) methods couple high density transposon mutagenesis with next-generation sequencing and are commonly used to identify essential or important genes in bacteria. ...However, this approach can be work-intensive and sometimes expensive depending on the selected protocol. The difficulty to process a high number of samples in parallel using standard TIS protocols often restricts the number of replicates that can be performed and limits the deployment of this technique to large-scale projects studying gene essentiality in various strains or growth conditions. Here, we report the development of a robust and inexpensive High-Throughput Transposon Mutagenesis (HTTM) protocol and validate the method using Escherichia coli strain BW25113, the parental strain of the KEIO collection. HTTM reliably provides high insertion densities with an average of one transposon every ≤20bp along with impressive reproducibility (Spearman correlation coefficients >0.94). A detailed protocol is available at protocol.io and a graphical version is also included with this article.
During pregnancy, maternal metabolism undergoes substantial changes to support the developing fetus. Such changes are finely regulated by different mechanisms carried out by effectors such as ...microRNAs (miRNAs). These small non-coding RNAs regulate numerous biological functions, mostly through post-transcriptional repression of gene expression. miRNAs are also secreted in circulation by numerous organs, such as the placenta. However, the complete plasmatic microtranscriptome of pregnant women has still not been fully described, although some miRNA clusters from the chromosome 14 (C14MC) and the chromosome 19 (C19MC and miR-371-3 cluster) have been proposed as being specific to pregnancy. Our aims were thus to describe the plasma microtranscriptome during the first trimester of pregnancy, by assessing the differences with non-pregnant women, and how it varies between the 4
and the 16
week of pregnancy.
Plasmatic miRNAs from 436 pregnant (gestational week 4 to 16) and 15 non-pregnant women were quantified using Illumina HiSeq next-generation sequencing platform. Differentially abundant miRNAs were identified using DESeq2 package (FDR q-value ≤ 0.05) and their targeted biological pathways were assessed with DIANA-miRpath.
A total of 2101 miRNAs were detected, of which 191 were differentially abundant (fold change < 0.05 or > 2, FDR q-value ≤ 0.05) between pregnant and non-pregnant women. Of these, 100 miRNAs were less and 91 miRNAs were more abundant in pregnant women. Additionally, the abundance of 57 miRNAs varied according to gestational age at first trimester, of which 47 were positively and 10 were negatively associated with advancing gestational age. miRNAs from the C19MC were positively associated with both pregnancy and gestational age variation during the first trimester. Biological pathway analysis revealed that these 191 (pregnancy-specific) and 57 (gestational age markers) miRNAs targeted genes involved in fatty acid metabolism, ECM-receptor interaction and TGF-beta signaling pathways.
We have identified circulating miRNAs specific to pregnancy and/or that varied with gestational age in first trimester. These miRNAs target biological pathways involved in lipid metabolism as well as placenta and embryo development, suggesting a contribution to the maternal metabolic adaptation to pregnancy and fetal growth.
Termination of RNA polymerase II (RNAPII) transcription is a fundamental step of gene expression that is critical for determining the borders between genes. In budding yeast, termination at ...protein-coding genes is initiated by the cleavage/polyadenylation machinery, whereas termination of most noncoding RNA (ncRNA) genes occurs via the Nrd1-Nab3-Sen1 (NNS) pathway. Here, we find that NNS-like transcription termination is not conserved in fission yeast. Rather, genome-wide analyses show global recruitment of mRNA 3' end processing factors at the end of ncRNA genes, including snoRNAs and snRNAs, and that this recruitment coincides with high levels of Ser2 and Tyr1 phosphorylation on the RNAPII C-terminal domain. We also find that termination of mRNA and ncRNA transcription requires the conserved Ysh1/CPSF-73 and Dhp1/XRN2 nucleases, supporting widespread cleavage-dependent transcription termination in fission yeast. Our findings thus reveal that a common mode of transcription termination can produce functionally and structurally distinct types of polyadenylated and non-polyadenylated RNAs.
In yeast, histone H3/H4 exchange independent of replication is poorly understood. Here, we analyzed the deposition of histone H3 molecules, synthesized during G1, using a high-density microarray ...histone exchange assay. While we found that H3 exchange in coding regions requires high levels of transcription, promoters exchange H3 molecules in the absence of transcription. In inactive promoters, H3 is deposited predominantly in well-positioned nucleosomes surrounding nucleosome-free regions, indicating that some nucleosomes in promoters are dynamic. This could facilitate induction of repressed genes. Importantly, we show that histone H3 K56 acetylation, a replication-associated mark, is also present in replication-independent newly assembled nucleosomes and correlates perfectly with the deposition of new H3. Finally, we found that transcription-dependent incorporation of H3 at promoters is highly dependent on Asf1. Taken together, our data underline the dynamic nature of replication-independent nucleosome assembly/disassembly, specify a link to transcription, and implicate Asf1 and H3 K56 acetylation.
Genome-scale metabolic models (GEMs) are mathematically structured knowledge bases of metabolism that provide phenotypic predictions from genomic information. GEM-guided predictions of growth ...phenotypes rely on the accurate definition of a biomass objective function (BOF) that is designed to include key cellular biomass components such as the major macromolecules (DNA, RNA, proteins), lipids, coenzymes, inorganic ions and species-specific components. Despite its importance, no standardized computational platform is currently available to generate species-specific biomass objective functions in a data-driven, unbiased fashion. To fill this gap in the metabolic modeling software ecosystem, we implemented BOFdat, a Python package for the definition of a Biomass Objective Function from experimental data. BOFdat has a modular implementation that divides the BOF definition process into three independent modules defined here as steps: 1) the coefficients for major macromolecules are calculated, 2) coenzymes and inorganic ions are identified and their stoichiometric coefficients estimated, 3) the remaining species-specific metabolic biomass precursors are algorithmically extracted in an unbiased way from experimental data. We used BOFdat to reconstruct the BOF of the Escherichia coli model iML1515, a gold standard in the field. The BOF generated by BOFdat resulted in the most concordant biomass composition, growth rate, and gene essentiality prediction accuracy when compared to other methods. Installation instructions for BOFdat are available in the documentation and the source code is available on GitHub (https://github.com/jclachance/BOFdat).
Mesoplasma florum, a fast‐growing near‐minimal organism, is a compelling model to explore rational genome designs. Using sequence and structural homology, the set of metabolic functions its genome ...encodes was identified, allowing the reconstruction of a metabolic network representing ˜ 30% of its protein‐coding genes. Growth medium simplification enabled substrate uptake and product secretion rate quantification which, along with experimental biomass composition, were integrated as species‐specific constraints to produce the functional iJL208 genome‐scale model (GEM) of metabolism. Genome‐wide expression and essentiality datasets as well as growth data on various carbohydrates were used to validate and refine iJL208. Discrepancies between model predictions and observations were mechanistically explained using protein structures and network analysis. iJL208 was also used to propose an in silico reduced genome. Comparing this prediction to the minimal cell JCVI‐syn3.0 and its parent JCVI‐syn1.0 revealed key features of a minimal gene set. iJL208 is a stepping‐stone toward model‐driven whole‐genome engineering.
SYNOPSIS
The first genome‐scale metabolic model for the near‐minimal bacterium Mesoplasma florum is reported. Comparing the model‐driven prediction of a M. florum genome reduction scenario to a closely related minimal cell reveals key features of a minimal gene set.
iJL208, the first genome‐scale metabolic model for the near‐minimal organism Mesoplasma florum, comprises 370 reactions and accounts for ˜ 30% of the total gene count in the genome.
Model‐driven predictions are validated through the integration of extensive experimental data, including gene expression datasets and growth phenotypes on various sugars.
A robust M. florum genome reduction scenario is predicted using gene essentiality data and transcription units, resulting in a minimal genome containing 535 protein‐coding genes.
A detailed comparison of this prediction to the phylogenetically related minimal cell JCVI‐syn3.0 reveals key features of a minimal gene set.
The first genome‐scale metabolic model for the near‐minimal bacterium Mesoplasma florum is reported. Comparing the model‐driven prediction of a M. florum genome reduction scenario to a closely related minimal cell reveals key features of a minimal gene set.
The near‐minimal bacterium Mesoplasma florum is an interesting model for synthetic genomics and systems biology due to its small genome (~ 800 kb), fast growth rate, and lack of pathogenic potential. ...However, fundamental aspects of its biology remain largely unexplored. Here, we report a broad yet remarkably detailed characterization of M. florum by combining a wide variety of experimental approaches. We investigated several physical and physiological parameters of this bacterium, including cell size, growth kinetics, and biomass composition of the cell. We also performed the first genome‐wide analysis of its transcriptome and proteome, notably revealing a conserved promoter motif, the organization of transcription units, and the transcription and protein expression levels of all protein‐coding sequences. We converted gene transcription and expression levels into absolute molecular abundances using biomass quantification results, generating an unprecedented view of the M. florum cellular composition and functions. These characterization efforts provide a strong experimental foundation for the development of a genome‐scale model for M. florum and will guide future genome engineering endeavors in this simple organism.
SYNOPSIS
A deep characterization of the near‐minimal bacterium M. florum reveals important features of this emerging model organism for systems and synthetic biology.
Analysis of M. florum growth kinetics reveals four bacterial growth phases in rich medium, with a doubling time of ~ 32 min and an optimal growth temperature of 34°C.
The most probable cell volume and cell mass are estimated through the integration of M. florum dry mass, cell diameter, and cell buoyant density into mathematical equations.
Transcriptome profiling identifies a conserved promoter motif as well as a complex transcriptome architecture, with many intragenic promoters and overlapping transcription units.
The absolute molecular abundance of RNA and protein species, including protein complexes such as ribosome and RNA polymerase are estimated.
A deep characterization of the near‐minimal bacterium M. florum reveals important features of this emerging model organism for systems and synthetic biology.