Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA ...sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME.
Abstract
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation. Accurate identification of RBP binding sites in multiple cell lines and tissue types from diverse species is a ...fundamental endeavor towards understanding the regulatory mechanisms of RBPs under both physiological and pathological conditions. Our POSTAR annotation processes make use of publicly available large-scale CLIP-seq datasets and external functional genomic annotations to generate a comprehensive map of RBP binding sites and their association with other regulatory events as well as functional variants. Here, we present POSTAR3, an updated database with improvements in data collection, annotation infrastructure, and analysis that support the annotation of post-transcriptional regulation in multiple species including: we made a comprehensive update on the CLIP-seq and Ribo-seq datasets which cover more biological conditions, technologies, and species; we added RNA secondary structure profiling for RBP binding sites; we provided miRNA-mediated degradation events validated by degradome-seq; we included RBP binding sites at circRNA junction regions; we expanded the annotation of RBP binding sites, particularly using updated genomic variants and mutations associated with diseases. POSTAR3 is freely available at http://postar.ncrnalab.org.
Abstract
We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE ...provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest.
Abstract
Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics ...continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module ‘Translatome’, which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein–RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.
Abstract
Circular RNAs (circRNAs) are emerging as a new class of endogenous and regulatory noncoding RNAs in latest years. With the widespread application of RNA sequencing (RNA-seq) technology and ...bioinformatics prediction, large numbers of circRNAs have been identified. However, at present, we lack a comprehensive characterization of all these circRNAs in interested samples. In this study, we integrated 87 935 circRNAs sequences that cover most of circRNAs identified till now represented in circBase to design microarray probes targeting back-splice site of each circRNA to profile expression of those circRNAs. By comparing the circRNA detection efficiency of RNA-seq with this circRNA microarray, we revealed that microarray is more efficient than RNA-seq for circRNA profiling. Then, we found ∼80 000 circRNAs were expressed in cervical tumors and matched normal tissues, and ∼25 000 of them were differently expressed. Notably, many of these circRNAs detected by this microarray can be validated by quantitative reverse transcription polymerase chain reaction (RT-qPCR) or RNA-seq. Strikingly, as many as ∼18 000 circRNAs could be robustly detected in cell-free plasma samples, and the expression of ∼2700 of them differed after surgery for tumor removal. Our findings provided a comprehensive and genome-wide characterization of circRNAs in paired normal tissues and tumors and plasma samples from multiple individuals. In addition, we also provide a rich resource with 41 microarray data sets and 10 RNA-seq data sets and strong evidences for circRNA expression in cervical cancer. In conclusion, circRNAs could be efficiently profiled by circRNA microarray to target their reported back-splice sites in interested samples.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Recently, in addition to poly(A)+ long non‐coding RNAs (lncRNAs), many lncRNAs without poly(A) tails, have been characterized in mammals. However, the non‐polyA lncRNAs and their conserved motifs, ...especially those associated with environmental stresses, have not been fully investigated in plant genomes. We performed poly(A)− RNA‐seq for seedlings of Arabidopsis thaliana under four stress conditions, and predicted lncRNA transcripts. We classified the lncRNAs into three confidence levels according to their expression patterns, epigenetic signatures and RNA secondary structures. Then, we further classified the lncRNAs to poly(A)+ and poly(A)− transcripts. Compared with poly(A)+ lncRNAs and coding genes, we found that poly(A)− lncRNAs tend to have shorter transcripts and lower expression levels, and they show significant expression specificity in response to stresses. In addition, their differential expression is significantly enriched in drought condition and depleted in heat condition. Overall, we identified 245 poly(A)+ and 58 poly(A)− lncRNAs that are differentially expressed under various stress stimuli. The differential expression was validated by qRT‐PCR, and the signaling pathways involved were supported by specific binding of transcription factors (TFs), phytochrome‐interacting factor 4 (PIF4) and PIF5. Moreover, we found many conserved sequence and structural motifs of lncRNAs from different functional groups (e.g. a UUC motif responding to salt and a AU‐rich stem‐loop responding to cold), indicated that the conserved elements might be responsible for the stress‐responsive functions of lncRNAs.
Long noncoding RNAs (lncRNAs), a recently discovered class of cellular RNAs, play important roles in the regulation of many cellular developmental processes. Although lncRNAs have been systematically ...identified in various systems, most of them have not been functionally characterized in vivo in animal models. In this study, we identified 128 testis-specific Drosophila lncRNAs and knocked out 105 of them using an optimized three-component CRISPR/Cas9 system. Among the lncRNA knockouts, 33 (31%) exhibited a partial or complete loss of male fertility, accompanied by visual developmental defects in late spermatogenesis. In addition, six knockouts were fully or partially rescued by transgenes in a trans configuration, indicating that those lncRNAs primarily work in trans Furthermore, gene expression profiles for five lncRNA mutants revealed that testis-specific lncRNAs regulate global gene expression, orchestrating late male germ cell differentiation. Compared with coding genes, the testis-specific lncRNAs evolved much faster. Moreover, lncRNAs of greater functional importance exhibited higher sequence conservation, suggesting that they are under constant evolutionary selection. Collectively, our results reveal critical functions of rapidly evolving testis-specific lncRNAs in late Drosophila spermatogenesis.
Reliable noninvasive biomarkers for hepatocellular carcinoma (HCC) diagnosis and prognosis are urgently needed. We explored the potential of not only microRNAs (miRNAs) but other types of noncoding ...RNAs (ncRNAs) as HCC biomarkers.
Peripheral blood samples were collected from 77 individuals; among them, 57 plasma cell-free RNA transcriptomes and 20 exosomal RNA transcriptomes were profiled. Significantly upregulated ncRNAs and published potential HCC biomarkers were validated with reverse transcription (RT)-qPCR in an independent validation cohort (60-150 samples). We particularly investigated the diagnosis and prognosis performance and biological function for 1 ncRNA biomarker,
, and its S fragment.
We identified certain circulating ncRNAs escaping from RNase degradation, possibly through binding with RNA-binding proteins: 899 ncRNAs were highly upregulated in HCC patients. Among them, 337 genes were fragmented long noncoding RNAs, 252 genes were small nucleolar RNAs, and 134 genes were piwi-interacting RNAs. Forty-eight candidates were selected and validated with RT-qPCR, of which, 16 ncRNAs were verified to be significantly upregulated in HCC, including
,
,
, and
. Particularly, the abundance of
S fragment discriminated HCC samples from negative controls (area under the curve, 0.87; 95% CI, 0.817-0.920). HCC patients with higher concentrations of
S fragment had lower survival rates. Furthermore,
S fragment alone promoted cancer cell proliferation and clonogenic growth.
Our results show that various ncRNA species, not only miRNAs, identified in the small RNA sequencing of plasma are also able to serve as noninvasive biomarkers. Particularly, we identified a domain of srpRNA
with reliable clinical performance for HCC diagnosis and prognosis.
We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory ...maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (∼23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splicing, editing, and modification), RNA secondary structures, disease-associated variants, and gene expression and function. Moreover, POSTAR provides a friendly, multi-mode, integrated search interface, which helps users to connect multiple RBP binding sites with post-transcriptional regulatory events, phenotypes, and diseases. Based on our platform, we were able to obtain novel insights into post-transcriptional regulation, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs. In summary, POSTAR represents an early effort to systematically annotate post-transcriptional regulatory maps and explore the putative roles of RBPs in human diseases.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments ...using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.