RNA regulation is significantly dependent on its binding protein partner, known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well ...characterized. Interdependencies between sequence and secondary structure specificities is challenging for both predicting RBP binding sites and accurate sequence and structure motifs detection.
In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, to enable subsequent convolution operations. To reveal the hidden binding knowledge from the observed sequences, the CNNs are applied to learn the abstract features. Considering the close relationship between sequence and predicted structures, we use the BLSTM to capture possible long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets. The results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage compared to other methods is that iDeepS can automatically extract both binding sequence and structure motifs, which will improve our understanding of the mechanisms of binding specificities of RBPs.
Our study shows that the iDeepS method identifies the sequence and structure motifs to accurately predict RBP binding sites. iDeepS is available at https://github.com/xypan1232/iDeepS .
Abstract
Motivation
RNA-binding proteins (RBPs) take over 5–10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding ...sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using patterns learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process.
Results
In this study, we present a computational method iDeepE to predict RNA–protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN runs 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs.
Availability and implementation
https://github.com/xypan1232/iDeepE
Supplementary information
Supplementary data are available at Bioinformatics online.
RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional ...regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation.
In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications.
The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.
Abstract
Motivation
The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important ...information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date.
Results
In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset.
Availability and implementation
The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator.
Supplementary information
Supplementary data are available at Bioinformatics online.
One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can ...provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcellular localization can only cover three or four location sites, and none of them can be used to deal with multiplex plant proteins that can simultaneously exist at two, or move between, two or more different location sits. Actually, such multiplex proteins might have special biological functions worthy of particular notice. The present study was devoted to improve the existing plant protein subcellular location predictors from the aforementioned two aspects. A new predictor called “Plant-mPLoc” is developed by integrating the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify plant proteins among the following 12 location sites: (1) cell membrane, (2) cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6) extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome, (11) plastid, and (12) vacuole. Compared with the existing methods for predicting plant protein subcellular localization, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins, which is beyond the reach of any existing predictors specialized for identifying plant protein subcellular localization. As a user-friendly web-server, Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/ . Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. It is anticipated that the Plant-mPLoc predictor as presented in this paper will become a very useful tool in plant science as well as all the relevant areas.
We investigate transmission optimization for intelligent reflecting surface (IRS) assisted multi-antenna systems from the physical-layer security perspective. The design goal is to maximize the ...system secrecy rate subject to the source transmit power constraint and the unit modulus constraints imposed on phase shifts at the IRS. To solve this complicated non-convex problem, we develop an efficient alternating algorithm where the solutions to the transmit covariance of the source and the phase shift matrix of the IRS are achieved in closed form and semi-closed form, respectively. The convergence of the proposed algorithm is guaranteed theoretically. Simulation results validate the performance advantage of the proposed optimized design.
Chinese traditional liquor, a major type of global distilled spirits, offers a unique flavor system acquired across thousands of years of development. Owing to the various raw brewing materials, ...types of koji, fermentation vessels, and processes used during liquor production, significant differences can occur in the content of flavor chemical components, such as esters, alcohols, aromatics, ketones, nitrogen compounds, acids, and aldehydes in the resulting liquor. Therefore, the liquor can be characterized on the basis of four basic flavors: sauce‐, strong‐, light‐, and rice‐aroma, and eight derivative flavors: feng‐, sesame‐, chi‐, te‐, mixed‐, laobaigan‐, herbal‐, and fuyu‐aroma. In this review, we describe the production and development process of Chinese traditional liquor in detail; summarize the flavor types, flavor chemical composition characteristics, and research progress related to this liquor; and discuss the influence of trace chemical components on liquor flavor, with the aim of laying a theoretical foundation for stabilizing the quality and increasing the yield of traditional liquor.
Hepatocellular carcinoma (HCC) ranks the most common primary liver malignancy and the third leading cause of tumor-related mortality worldwide. Unfortunately, despite advances in HCC treatment, less ...than 40% of HCC patients are eligible for potentially curative therapies. Recently, cancer immunotherapy has emerged as one of the most promising approaches for cancer treatment. It has been proven therapeutically effective in many types of solid tumors, such as non-small cell lung cancer and melanoma. As an inflammation-associated tumor, it's well-evidenced that the immunosuppressive microenvironment of HCC can promote immune tolerance and evasion by various mechanisms. Triggering more vigorous HCC-specific immune response represents a novel strategy for its management. Pre-clinical and clinical investigations have revealed that various immunotherapies might extend current options for needed HCC treatment. In this review, we provide the recent progress on HCC immunology from both basic and clinical perspectives, and discuss potential advances and challenges of immunotherapy in HCC.
Reverse engineering of gene regulatory networks (GRNs) is a central task in systems biology. Most of the existing methods for GRN inference rely on gene co-expression analysis or TF-target binding ...information, where the determination of co-expression is often unreliable merely based on gene expression levels, and the TF-target binding data from high-throughput experiments may be noisy, leading to a high ratio of false links and missed links, especially for large-scale networks. In recent years, the microscopy images recording spatial gene expression have become a new resource in GRN reconstruction, as the spatial and temporal expression patterns contain much abundant gene interaction information. Till now, the spatial expression resources have been largely underexploited, and only a few traditional image processing methods have been employed in the image-based GRN reconstruction. Moreover, co-expression analysis using conventional measurements based on image similarity may be inaccurate, because it is the local-pattern consistency rather than global-image-similarity that determines gene-gene interactions. Here we present GripDL (Gene regulatory interaction prediction via Deep Learning), which incorporates high-confidence TF-gene regulation knowledge from previous studies, and constructs GRNs for Drosophila eye development based on Drosophila embryonic gene expression images. Benefitting from the powerful representation ability of deep neural networks and the supervision information of known interactions, the new method outperforms traditional methods with a large margin and reveals new intriguing knowledge about Drosophila eye development.
Abstract
Heterostructuring electrodes with multiple electroactive and inactive supporting components to simultaneously satisfy electrochemical and structural requirements has recently been identified ...as a viable pathway to achieve high‐capacity and durable sodium‐ion batteries (SIBs). Here, a new design of heterostructured SIB anode is reported consisting of double metal‐sulfide (SnCo)S
2
nanocubes interlaced with 2D sulfur‐doped graphene (SG) nanosheets. The heterostructured (SnCo)S
2
/SG nanocubes exhibit an excellent rate capability (469 mAh g
−1
at 10.0 A g
−1
) and durability (5000 cycles, 487 mAh g
−1
at 5.0 A g
−1
, 92.6% capacity retention). In situ X‐ray diffraction reveals that the (SnCo)S
2
/SG anode undergoes a six‐stage Na
+
storage mechanism of combined intercalation, conversion, and alloying reactions. The first‐principle density functional theory calculations suggest high concentration of
p–n
heterojunctions at SnS
2
/CoS
2
interfaces responsible for the high rate performance, while in situ transmission electron microscopy unveils that the interlacing and elastic SG nanosheets play a key role in extending the cycle life.