We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA ...sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs.
In conventional RNA high-throughput sequencing, modified bases prevent a large fraction of tRNA transcripts to be converted into cDNA libraries. Recent proposals aiming at resolving this issue take ...advantage of the interference of base modifications with RT enzymes to detect and identify them by establishing signals from aborted cDNA transcripts. Because some modifications, such as methyl groups, do almost not allow RT bypassing, demethylation and highly processive RT enzymes have been used to overcome these obstacles. Working with
as a model system, we show that with a conventional (albeit still engineered) RT enzyme and key optimizations in library preparation, all RT-impairing modifications can be highlighted along the entire tRNA length without demethylation procedure. This is achieved by combining deep-sequencing samples, which allows to establish aborted transcription signal of higher accuracy and reproducibility, with the potential for differentiating tiny differences in the state of modification of all cellular tRNAs. In addition, our protocol provides estimates of the relative tRNA abundance.
Highlights • A review of current computational programs for prioritizing or scoring mutations in the coding and non-coding genome. • A summary of non-coding elements and their mutations involved in ...cancer. • A precise categorization of software for calling cancer mutations, scoring mutations, and identifying potential cancer driving mutations. • Perspectives for the development of computational tools for non-coding mutation assessment.
Antisense transcription can regulate sense gene expression. However, previous annotations of antisense transcription units have been based on detection of mature antisense long noncoding (aslnc)RNAs ...by RNA-seq and/or microarrays, only giving a partial view of the antisense transcription landscape and incomplete molecular bases for antisense-mediated regulation. Here, we used native elongating transcript sequencing to map genome-wide nascent antisense transcription in fission yeast. Strikingly, antisense transcription was detected for most protein-coding genes, correlating with low sense transcription, especially when overlapping the mRNA start site. RNA profiling revealed that the resulting aslncRNAs mainly correspond to cryptic Xrn1/Exo2-sensitive transcripts (XUTs). ChIP-seq analyses showed that antisense (as)XUT's expression is associated with specific histone modification patterns. Finally, we showed that asXUTs are controlled by the histone chaperone Spt6 and respond to meiosis induction, in both cases anti-correlating with levels of the paired-sense mRNAs, supporting physiological significance to antisense-mediated gene attenuation. Our work highlights that antisense transcription is much more extended than anticipated and might constitute an additional nonpromoter determinant of gene regulation complexity.
Alternate polyadenylation is an important post-transcriptional regulatory process now open to large-scale analysis by use of cDNA databases. We clustered 164,000 expressed sequence tags (ESTs) into ...approximately 15,000 groups and aligned each group to a putative mRNA 3' end. By use of stringent criteria to discard artifactual mRNA extremities, clear evidence for alternate polyadenylation was obtained in 189 of the 1000 EST clusters studied. A number of previously unreported polyadenylation sites were identified, together with possible instances of tissue-specific differential polyadenylation. This study demonstrates that, besides quantitative aspects of gene expression, the distribution of alternate mRNA forms can be analyzed through EST sampling.
Metastatic relapse after treatment is the leading cause of cancer mortality, and known resistance mechanisms are missing for most treatments administered to patients. To bridge this gap, we analyze a ...pan-cancer cohort (META-PRISM) of 1,031 refractory metastatic tumors profiled via whole-exome and transcriptome sequencing. META-PRISM tumors, particularly prostate, bladder, and pancreatic types, displayed the most transformed genomes compared with primary untreated tumors. Standard-of-care resistance biomarkers were identified only in lung and colon cancers-9.6% of META-PRISM tumors, indicating that too few resistance mechanisms have received clinical validation. In contrast, we verified the enrichment of multiple investigational and hypothetical resistance mechanisms in treated compared with nontreated patients, thereby confirming their putative role in treatment resistance. Additionally, we demonstrated that molecular markers improve 6-month survival prediction, particularly in patients with advanced breast cancer. Our analysis establishes the utility of the META-PRISM cohort for investigating resistance mechanisms and performing predictive analyses in cancer.
This study highlights the paucity of standard-of-care markers that explain treatment resistance and the promise of investigational and hypothetical markers awaiting further validation. It also demonstrates the utility of molecular profiling in advanced-stage cancers, particularly breast cancer, to improve the survival prediction and assess eligibility to phase I clinical trials. This article is highlighted in the In This Issue feature, p. 1027.
The advent of large-scale gene expression technologies has helped to reveal in eukaryotic cells, the existence of thousands of non-coding transcripts, whose function and significance remain mostly ...poorly understood. Among these non-coding transcripts, long non-coding RNAs (lncRNAs) are the least well-studied but are emerging as key regulators of diverse cellular processes. In the present study, we performed a survey in bovine Longissimus thoraci of lincRNAs (long intergenic non-coding RNAs not overlapping protein-coding transcripts). To our knowledge, this represents the first such study in bovine muscle.
To identify lincRNAs, we used paired-end RNA sequencing (RNA-Seq) to explore the transcriptomes of Longissimus thoraci from nine Limousin bull calves. Approximately 14-45 million paired-end reads were obtained per library. A total of 30,548 different transcripts were identified. Using a computational pipeline, we defined a stringent set of 584 different lincRNAs with 418 lincRNAs found in all nine muscle samples. Bovine lincRNAs share characteristics seen in their mammalian counterparts: relatively short transcript and gene lengths, low exon number and significantly lower expression, compared to protein-encoding genes. As for the first time, our study identified lincRNAs from nine different samples from the same tissue, it is possible to analyse the inter-individual variability of the gene expression level of the identified lincRNAs. Interestingly, there was a significant difference when we compared the expression variation of the 418 lincRNAs with the 10,775 known selected protein-encoding genes found in all muscle samples. In addition, we found 2,083 pairs of lincRNA/protein-encoding genes showing a highly significant correlated expression. Fourteen lincRNAs were selected and 13 were validated by RT-PCR. Some of the lincRNAs expressed in muscle are located within quantitative trait loci for meat quality traits.
Our study provides a glimpse into the lincRNA content of bovine muscle and will facilitate future experimental studies to unravel the function of these molecules. It may prove useful to elucidate their effect on mechanisms underlying the genetic variability of meat quality traits. This catalog will complement the list of lincRNAs already discovered in cattle and therefore will help to better annotate the bovine genome.
Of the 1.1 million Alu retroposons in the human genome, about 10,000 are inserted in the 3′ untranslated regions (UTR) of protein-coding genes and 1% of these (107 events) are active as ...polyadenylation sites (PASs). Strikingly, although Alu's in 3′ UTR are indifferently inserted in the forward or reverse direction, 99% of polyadenylation-active Alu sequences are forward oriented. Consensus Alu+ sequences contain sites that can give rise to polyadenylation signals and enhancers through a few point mutations. We found that the strand bias of polyadenylation-active Alu's reflects a radical difference in the fitness of sense and antisense Alu's toward cleavage/polyadenylation activity. In contrast to previous beliefs, Alu inserts do not necessarily represent weak or cryptic PASs; instead, they often constitute the major or the unique PAS in a gene, adding to the growing list of Alu exaptations. Finally, some Alu-borne PASs are intronic and produce truncated transcripts that may impact gene function and/or contribute to gene remodeling.