Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a ...novel bioinformatics tool (RanSEPs) with “‐omics” approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins.
Synopsis
RanSEPs is a random forest‐based computational approach capable of predicting small encoded proteins in a species‐specific context. Running this tool in 109 bacterial genomes indicated that up to 16 ± 9.5% of the proteins in a genome could be SEPs.
Integration of transcriptomics and proteomics from 12 bacterial species showed that high‐throughput experimental characterization of small proteins (SEPs) presents multiple limitations and false positive detections.
RanSEPs is a computational approach that assigns coding potential scores to SEP candidates in a species‐specific manner based on sequence features.
After running RanSEPs in 109 bacterial genomes, we determined that between 6 and 25% of the proteins of a bacterial genome could be SEPs.
Function prediction of RanSEPs‐predicted SEPs revealed an enrichment in translation, metabolism and nucleotide‐binding proteins.
RanSEPs is a random forest‐based computational approach capable of predicting small encoded proteins in a species‐specific context. Running this tool in 109 bacterial genomes indicated that up to 16 ± 9.5% of the proteins in a genome could be SEPs.
Mycoplasmas are important model organisms for Systems and Synthetic Biology, and are pathogenic to a wide variety of species. Despite their relevance, many of the tools established for genome editing ...in other microorganisms are not available for Mycoplasmas. The Tn4001 transposon is the reference tool to work with these bacteria, but the transformation efficiencies (TEs) reported for the different species vary substantially. Here, we explore the mechanisms underlying these differences in four Mycoplasma species, Mycoplasma agalactiae, Mycoplasma feriruminatoris, Mycoplasma gallisepticum and Mycoplasma pneumoniae, selected for being representative members of each cluster of the Mycoplasma genus. We found that regulatory regions (RRs) driving the expression of the transposase and the antibiotic resistance marker have a major impact on the TEs. We then designed a synthetic RR termed SynMyco RR to control the expression of the key transposon vector elements. Using this synthetic RR, we were able to increase the TE for M. gallisepticum, M. feriruminatoris and M. agalactiae by 30-, 980- and 1036-fold, respectively. Finally, to illustrate the potential of this new transposon, we performed the first essentiality study in M. agalactiae, basing our study on more than 199,000 genome insertions.
•Genome-reduced bacteria have lost regulatory proteins acting at most regulatory levels.•Minimal bacteria have retained sequence features to regulate transcription.•Non-transcription factor ...regulation can occur at genome-wide, operon and transcript level.
Transcription is a core process of bacterial physiology, and as such it must be tightly controlled, so that bacterial cells maintain steady levels of each RNA molecule in homeostasis and modify them in response to perturbations. The major regulators of transcription in bacteria (and in eukaryotes) are transcription factors. However, in genome-reduced bacteria, the limited number of these proteins is insufficient to explain the variety of responses shown upon changes in their environment. Thus, alternative regulators may play a central role in orchestrating RNA levels in these microorganisms. These alternative mechanisms rely on intrinsic features within DNA and RNA molecules, suggesting they are ancestral mechanisms shared among bacteria that could have an increased relevance on transcriptional regulation in minimal cells. In this review, we summarize the alternative elements that can regulate transcript abundance in genome-reduced bacteria and how they contribute to the RNA homeostasis at different levels.
Independent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome ...by clustering many iterations of ICA together to obtain robust components. Existing algorithms for robust ICA are dependent on the choice of clustering method and on computing a potentially biased and large Pearson distance matrix.
We present robustica, a Python-based package to compute robust independent components with a fully customizable clustering algorithm and distance metric. Here, we exploited its customizability to revisit and optimize robust ICA systematically. Of the 6 popular clustering algorithms considered, DBSCAN performed the best at clustering independent components across ICA iterations. To enable using Euclidean distances, we created a subroutine that infers and corrects the components' signs across ICA iterations. Our subroutine increased the resolution, robustness, and computational efficiency of the algorithm. Finally, we show the applicability of robustica by dissecting over 500 tumor samples from low-grade glioma (LGG) patients, where we define two new gene expression modules with key modulators of tumor progression upon IDH1 and TP53 mutagenesis.
robustica brings precise, efficient, and customizable robust ICA into the Python toolbox. Through its customizability, we explored how different clustering algorithms and distance metrics can further optimize robust ICA. Then, we showcased how robustica can be used to discover gene modules associated with combinations of features of biological interest. Taken together, given the broad applicability of ICA for omic data analysis, we envision robustica will facilitate the seamless computation and integration of robust independent components in large pipelines.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Identifying open reading frames (ORFs) being translated is not a trivial task. ProTInSeq is a technique designed to characterize proteomes by sequencing transposon insertions engineered to express a ...selection marker when they occur in-frame within a protein-coding gene. In the bacterium Mycoplasma pneumoniae, ProTInSeq identifies 83% of its annotated proteins, along with 5 proteins and 153 small ORF-encoded proteins (SEPs; ≤100 aa) that were not previously annotated. Moreover, ProTInSeq can be utilized for detecting translational noise, as well as for relative quantification and transmembrane topology estimation of fitness and non-essential proteins. By integrating various identification approaches, the number of initially annotated SEPs in this bacterium increases from 27 to 329, with a quarter of them predicted to possess antimicrobial potential. Herein, we describe a methodology complementary to Ribo-Seq and mass spectroscopy that can identify SEPs while providing other insights in a proteome with a flexible and cost-effective DNA ultra-deep sequencing approach.
We utilised a novel genome deletion strategy involving the Cre/Lox system, coupled with random transposon mutagenesis, in the genome reduced bacterium Mycoplasma pneumoniae. We successfully ...demonstrated this approach can self‐selectively create a library of diverse and large genome deletions, and demonstrate a sequencing strategy capable of accurately identifying them.
Summary
The removal of unwanted genetic material is a key aspect in many synthetic biology efforts and often requires preliminary knowledge of which genomic regions are dispensable. Typically, these efforts are guided by transposon mutagenesis studies, coupled to deepsequencing (TnSeq) to identify insertion points and gene essentiality. However, epistatic interactions can cause unforeseen changes in essentiality after the deletion of a gene, leading to the redundancy of these essentiality maps. Here, we present LoxTnSeq, a new methodology to generate and catalogue libraries of genome reduction mutants. LoxTnSeq combines random integration of lox sites by transposon mutagenesis, and the generation of mutants via Cre recombinase, catalogued via deep sequencing. When LoxTnSeq was applied to the naturally genome reduced bacterium Mycoplasma pneumoniae, we obtained a mutant pool containing 285 unique deletions. These deletions spanned from > 50 bp to 28 Kb, which represents 21% of the total genome. LoxTnSeq also highlighted large regions of non‐essential genes that could be removed simultaneously, and other non‐essential regions that could not, providing a guide for future genome reductions.
Mycoplasmas have exceptionally streamlined genomes and are strongly adapted to their many hosts, which provide them with essential nutrients. Owing to their relative genomic simplicity, Mycoplasmas ...have been used to develop chassis for biotechnological applications. However, the dearth of robust and precise toolkits for genomic manipulation and tight regulation has hindered any substantial advance. Herein we describe the construction of a robust genetic toolkit for M. pneumoniae, and its successful deployment to engineer synthetic gene switches that control and limit Mycoplasma growth, for biosafety containment applications. We found these synthetic gene circuits to be stable and robust in the long-term, in the context of a minimal cell. With this work, we lay a foundation to develop viable and robust biosafety systems to exploit a synthetic Mycoplasma chassis for live attenuated vectors for therapeutic applications.
Here, we propose an approach to identify active metabolic pathways by integrating gene essentiality analysis and protein abundance. We use two bacterial species (Mycoplasma pneumoniae and Mycoplasma ...agalactiae) that share a high gene content similarity yet show significant metabolic differences. First, we build detailed metabolic maps of their carbon metabolism, the most striking difference being the absence of two key enzymes for glucose metabolism in M. agalactiae. We then determine carbon sources that allow growth in M. agalactiae, and we introduce glucose-dependent growth to show the functionality of its remaining glycolytic enzymes. By analyzing gene essentiality and performing quantitative proteomics, we can predict the active metabolic pathways connected to carbon metabolism and show significant differences in use and direction of key pathways despite sharing the large majority of genes. Gene essentiality combined with quantitative proteomics and metabolic maps can be used to determine activity and directionality of metabolic pathways.
Display omitted
•Active metabolic bacterial pathways are identified•Integration of gene essentiality and proteomics allow prediction of active pathways•Glucose-dependent growth is restored in Mycoplasma agalactiae•Two Mycoplasma species show different usage of metabolic pathways
Montero-Blay et al. identify active metabolic pathways in bacteria by integrating gene essentiality data and quantitative proteomics. Predictions agree with experimental information and show substantial differences in usage and directionality of metabolic pathways in bacteria with high degree of gene similarity.
Cancer is a rapidly evolving, multifactorial disease that accumulates numerous genetic and epigenetic alterations. This results in molecular and phenotypic heterogeneity within the tumor, the ...complexity of which is further amplified through specific interactions between cancer cells. We aimed to dissect the molecular mechanisms underlying the cooperation between different clones.
We produced clonal cell lines derived from the MDA-MB-231 breast cancer cell line, using the UbC-StarTrack system, which allowed tracking of multiple clones by color: GFP C3, mKO E10 and Sapphire D7. Characterization of these clones was performed by growth rate, cell metabolic activity, wound healing, invasion assays and genetic and epigenetic arrays. Tumorigenicity was tested by orthotopic and intravenous injections. Clonal cooperation was evaluated by medium complementation, co-culture and co-injection assays.
Characterization of these clones in vitro revealed clear genetic and epigenetic differences that affected growth rate, cell metabolic activity, morphology and cytokine expression among cell lines. In vivo, all clonal cell lines were able to form tumors; however, injection of an equal mix of the different clones led to tumors with very few mKO E10 cells. Additionally, the mKO E10 clonal cell line showed a significant inability to form lung metastases. These results confirm that even in stable cell lines heterogeneity is present. In vitro, the complementation of growth medium with medium or exosomes from parental or clonal cell lines increased the growth rate of the other clones. Complementation assays, co-growth and co-injection of mKO E10 and GFP C3 clonal cell lines increased the efficiency of invasion and migration.
These findings support a model where interplay between clones confers aggressiveness, and which may allow identification of the factors involved in cellular communication that could play a role in clonal cooperation and thus represent new targets for preventing tumor progression.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK