Abstract
Motivation
In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our ...knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.
Results
We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances.
Availability and implementation
https://github.com/kamimrcht/REINDEER.
Supplementary information
Supplementary data are available at Bioinformatics online.
The structure and function of conserved motifs constituting the apex of Stem I in T-box mRNA leaders are investigated. We point out that this apex shares striking similarities with the L1 stalk ...(helices 76-78) of the ribosome. A sequence and structure analysis of both elements shows that, similarly to the head of the L1 stalk, the function of the apex of Stem I lies in the docking of tRNA through a stacking interaction with the conserved G19:C56 base pair platform. The inferred structure in the apex of Stem I consists of a module of two T-loops bound together head to tail, a module that is also present in the head of the L1 stalk, but went unnoticed. Supporting the analysis, we show that a highly conserved structure in RNAse P formerly described as the J11/12-J12/11 module, which is precisely known to bind the elbow of tRNA, constitutes a third instance of this T-loop module. A structural analysis explains why six nucleotides constituting the core of this module are highly invariant among all three types of RNA. Our finding that major RNA partners of tRNA bind the elbow with a same RNA structure suggests an explanation for the origin of the tRNA L-shape.
Bacterial transcription attenuation occurs through a variety of cis-regulatory elements that control gene expression in response to a wide range of signals. The signal-sensing structures in ...attenuators are so diverse and rapidly evolving that only a small fraction have been properly annotated and characterized to date. Here we apply a broad-spectrum detection tool in order to achieve a more complete view of the transcriptional attenuation complement of key bacterial species.
Our protocol seeks gene families with an unusual frequency of 5' terminators found across multiple species. Many of the detected attenuators are part of annotated elements, such as riboswitches or T-boxes, which often operate through transcriptional attenuation. However, a significant fraction of candidates were not previously characterized in spite of their unmistakable footprint. We further characterized some of these new elements using sequence and secondary structure analysis. We also present elements that may control the expression of several non-homologous genes, suggesting co-transcription and response to common signals. An important class of such elements, which we called mobile attenuators, is provided by 3' terminators of insertion sequences or prophages that may be exapted as 5' regulators when inserted directly upstream of a cellular gene.
We show here that attenuators involve a complex landscape of signal-detection structures spanning the entire bacterial domain. We discuss possible scenarios through which these diverse 5' regulatory structures may arise or evolve.
The rising interest for precise characterization of the tumour immune contexture has recently brought forward the high potential of RNA sequencing (RNA-seq) in identifying molecular mechanisms ...engaged in the response to immunotherapy. In this review, we provide an overview of the major principles of single-cell and conventional (bulk) RNA-seq applied to onco-immunology. We describe standard preprocessing and statistical analyses of data obtained from such techniques and highlight some computational challenges relative to the sequencing of individual cells. We notably provide examples of gene expression analyses such as differential expression analysis, dimensionality reduction, clustering and enrichment analysis. Additionally, we used public data sets to exemplify how deconvolution algorithms can identify and quantify multiple immune subpopulations from either bulk or single-cell RNA-seq. We give examples of machine and deep learning models used to predict patient outcomes and treatment effect from high-dimensional data. Finally, we balance the strengths and weaknesses of single-cell and bulk RNA-seq regarding their applications in the clinic.
Display omitted
•Personalization of onco-immunotherapy treatment relies on both cancer and immune profiling.•Single-cell RNA-seq delivers detailed but sparse information.•Bulk RNA-seq requires extrapolation to quantitatively estimate the immune contexture.•We reviewed standard methods that can boost clinical applications of tumour RNA-seq.
Abstract
Motivation
KaMRaT is designed for processing large k-mer count tables derived from multi-sample, RNA-seq data. Its primary objective is to identify condition-specific or differentially ...expressed sequences, regardless of gene or transcript annotation.
Results
KaMRaT is implemented in C++. Major functions include scoring k-mers based on count statistics, merging overlapping k-mers into contigs and selecting k-mers based on their occurrence across specific samples.
Availability and implementation
Source code and documentation are available via https://github.com/Transipedia/KaMRaT.
The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that ...collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants.
The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/.
The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
A RT-PCR method is developed to isolate RNA aminoacylated on their 3’ end from large pools of RNA. The method is being applied in two separate projects. We are interested in isolating a new class of ...ribozymes that could successively catalyze the two chemical reactions leading to their own 3’ aminoacylation (ATP activation of an amino acid followed by 3' esterification of the RNA). The catalysis of each of the two reactions has independently been demonstrated for some RNA isolated with the SELEX methodology 1-2. However, the coupling of both reactions on a same molecule has not been achieved yet. The identification of these still hypothetical ribozymes may help understand how the former translation system started in the absence of the aminoacyltRNA Synthetase, which catalyzes the above two reactions on tRNA in modern cells. In another project, we would like to identify the whole repertoire of aminoacylated RNA (the “aminoacylome”) in cells. There are strong indications that other RNA besides tRNA and tmRNA may be aminoacylated for biological purposes 3-4.
A structural and functional classification of H/ACA and H/ACA-like motifs is obtained from the analysis of the H/ACA guide RNAs which have been identified previously in the genomes of Euryarchaea ...(Pyrococcus) and Crenarchaea (Pyrobaculum). A unified structure/function model is proposed based on the common structural determinants shared by H/ACA and H/ACA-like motifs in both Euryarchaea and Crenarchaea. Using a computational approach, structural and energetic rules for the guide:target RNA-RNA interactions are derived from structural and functional data on the H/ACA RNP particles. H/ACA(-like) motifs found in Pyrococcus are evaluated through the classification and their biological relevance is discussed. Extra-ribosomal targets found in both Pyrococcus and Pyrobaculum might support the hypothesis of a gene regulation mediated by H/ACA(-like) guide RNAs in archaea.
Natural killer cell and T cell subsets express at their cell surface a repertoire of receptors for MHC class I molecules, the natural killer cell receptors (NKRs). NKRs are characterized by the ...existence of inhibitory and activating isoforms, which are encoded by highly homologous but separate genes present in the same locus. Inhibitory isoforms express an intracytoplasmic immunoreceptor tyrosine-based inhibition motif, whereas activating isoforms lack any immunoreceptor tyrosine-based inhibition motif but harbor a charged amino acid residue in their transmembrane domain. We previously characterized KARAP (killer cell activating receptor-associated protein), a novel disulfide-linked tyrosine-phosphorylated dimer that selectively associates with the activating NKR isoforms. We report here the identification of the mouse KARAP gene, its localization on chromosome 7 and its genomic organization in five exons. Point mutation and transfection studies revealed that KARAP is a novel signaling transmembrane subunit whose transduction function depends on the integrity of an intracytoplasmic immunoreceptor tyrosine-based activation motif. In contrast to previous members of the immunoreceptor tyrosine-based activation motif polypeptide family, KARAP is ubiquitously expressed on hematopoietic and nonhematopoietic cells, suggesting its association with a broad range of activating receptors in a variety of tissues.
Abstract Background RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene ...annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data. Methods In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset. Results We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures. Conclusions Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.