Functional RNA molecules participate in numerous biological processes, ranging from gene regulation to protein synthesis. Analysis of functional RNA motifs and elements in RNA sequences can obtain ...useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA, is widely used in the identification of regulatory motifs, and this work extends it by incorporating more comprehensive and updated data sources and analytical approaches into a new platform.
An integrated web-based system, RegRNA 2.0, has been developed for comprehensively identifying the functional RNA motifs and sites in an input RNA sequence. Numerous data sources and analytical approaches are integrated, and several types of functional RNA motifs and sites can be identified by RegRNA 2.0: (i) splicing donor/acceptor sites; (ii) splicing regulatory motifs; (iii) polyadenylation sites; (iv) ribosome binding sites; (v) rho-independent terminator; (vi) motifs in mRNA 5'-untranslated region (5'UTR) and 3'UTR; (vii) AU-rich elements; (viii) C-to-U editing sites; (ix) riboswitches; (x) RNA cis-regulatory elements; (xi) transcriptional regulatory motifs; (xii) user-defined motifs; (xiii) similar functional RNA sequences; (xiv) microRNA target sites; (xv) non-coding RNA hybridization sites; (xvi) long stems; (xvii) open reading frames; (xviii) related information of an RNA sequence. User can submit an RNA sequence and obtain the predictive results through RegRNA 2.0 web page.
RegRNA 2.0 is an easy to use web server for identifying regulatory RNA motifs and functional sites. Through its integrated user-friendly interface, user is capable of using various analytical approaches and observing results with graphical visualization conveniently. RegRNA 2.0 is now available at http://regrna2.mbc.nctu.edu.tw.
Abstract
Protein post-translational modifications (PTMs) play an important role in different cellular processes. In view of the importance of PTMs in cellular functions and the massive data ...accumulated by the rapid development of mass spectrometry (MS)-based proteomics, this paper presents an update of dbPTM with over 2 777 000 PTM substrate sites obtained from existing databases and manual curation of literature, of which more than 2 235 000 entries are experimentally verified. This update has manually curated over 42 new modification types that were not included in the previous version. Due to the increasing number of studies on the mechanism of PTMs in the past few years, a great deal of upstream regulatory proteins of PTM substrate sites have been revealed. The updated dbPTM thus collates regulatory information from databases and literature, and merges them into a protein-protein interaction network. To enhance the understanding of the association between PTMs and molecular functions/cellular processes, the functional annotations of PTMs are curated and integrated into the database. In addition, the existing PTM-related resources, including annotation databases and prediction tools are also renewed. Overall, in this update, we would like to provide users with the most abundant data and comprehensive annotations on PTMs of proteins. The updated dbPTM is now freely accessible at https://awi.cuhk.edu.cn/dbPTM/.
Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated ...peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/ .
Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of ...physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites.
The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM).
The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/ .
Recurrence and poorly differentiated (grade 3 and above) and atypical cell type endometrial cancer (EC) have poor prognosis outcome. The mechanisms and characteristics of recurrence and distal ...metastasis of EC remain unclear. The extracellular matrix (ECM) of the reproductive tract in women undergoes extensive structural remodelling changes every month. Altered ECMs surrounding cells were believed to play crucial roles in a cancer progression. To decipher the associations between ECM and EC development, we generated a PAN-ECM Data list of 1516 genes including ECM molecules (ECMs), synthetic and degradation enzymes for ECMs, ECM receptors, and soluble molecules that regulate ECM and used RNA-Seq data from The Cancer Genome Atlas (TCGA) for the studies. The alterations of PAN-ECM genes by comparing the RNA-Seq expressions profiles of EC samples which have been grouped as tumorigenesis and metastasis group based on their pathological grading were identified. Differential analyses including functional enrichment, co-expression network, and molecular network analysis were carried out to identify the specific PAN-ECM genes that may involve in the progression of EC. Eight hundred and thirty-one and 241 PAN-ECM genes were significantly involved in tumorigenesis (p-value <1.571e-15) and metastasis (p-value <2.2e-16), respectively, whereas 140 genes were in the intersection of tumorigenesis and metastasis. Interestingly, 92 of the 140 intersecting PAN-ECM genes showed contrasting fold changes between the tumorigenesis and metastasis datasets. Enrichment analysis for the contrast PAN-ECM genes indicated pathways such as GP6 signaling, ILK signaling, and interleukin (IL)-8 signaling pathways were activated in metastasis but inhibited in tumorigenesis. The significantly activated ECM and ECM associated genes in GP6 signaling, ILK signaling, and interleukin (IL)-8 signaling pathways may play crucial roles in metastasis of EC. Our study provides a better understanding of the etiology and the progression of EC.
Recent studies have proposed several gene signatures as biomarkers for different grades of gliomas from various perspectives. However, most of these genes can only be used appropriately for patients ...with specific grades of gliomas.
In this study, we aimed to identify survival-relevant genes shared between glioblastoma multiforme (GBM) and lower-grade glioma (LGG), which could be used as potential biomarkers to classify patients into different risk groups. Cox proportional hazard regression model (Cox model) was used to extract relative genes, and effectiveness of genes was estimated against random forest regression. Finally, risk models were constructed with logistic regression.
We identified 104 key genes that were shared between GBM and LGG, which could be significantly correlated with patients' survival based on next-generation sequencing data obtained from The Cancer Genome Atlas for gene expression analysis. The effectiveness of these genes in the survival prediction of GBM and LGG was evaluated, and the average receiver operating characteristic curve (ROC) area under the curve values ranged from 0.7 to 0.8. Gene set enrichment analysis revealed that these genes were involved in eight significant pathways and 23 molecular functions. Moreover, the expressions of ten (CTSZ, EFEMP2, ITGA5, KDELR2, MDK, MICALL2, MAP 2 K3, PLAUR, SERPINE1, and SOCS3) of these genes were significantly higher in GBM than in LGG, and comparing their expression levels to those of the proposed control genes (TBP, IPO8, and SDHA) could have the potential capability to classify patients into high- and low- risk groups, which differ significantly in the overall survival. Signatures of candidate genes were validated, by multiple microarray datasets from Gene Expression Omnibus, to increase the robustness of using these potential prognostic factors. In both the GBM and LGG cohort study, most of the patients in the high-risk group had the IDH1 wild-type gene, and those in the low-risk group had IDH1 mutations. Moreover, most of the high-risk patients with LGG possessed a 1p/19q-noncodeletion.
In this study, we identified survival relevant genes which were shared between GBM and LGG, and those enabled to classify patients into high- and low-risk groups based on expression level analysis. Both the risk groups could be correlated with the well-known genetic variants, thus suggesting their potential prognostic value in clinical application.
This study is to identify potential multiomics biomarkers for the early detection of the prognostic recurrence of PC patients. A total of 494 prostate adenocarcinoma (PRAD) patients (60-recurrent ...included) from the Cancer Genome Atlas (TCGA) portal were analyzed using the autoencoder model and similarity network fusion. Then, multiomics panels were constructed according to the intersected omics biomarkers identified from the two models. Six intersected omics biomarkers, TELO2, ZMYND19, miR-143, miR-378a, cg00687383 (MED4), and cg02318866 (JMJD6; METTL23), were collected for multiomics panel construction. The difference between the Kaplan–Meier curves of high and low recurrence-risk groups generated from the multiomics panel achieved p-value = 5.33 × 10−9, which is better than the former study (p-value = 5 × 10−7). Additionally, when evaluating the selected multiomics biomarkers with clinical information (Gleason score, age, and cancer stage), a high-performance prediction model was generated with C-index = 0.713, p-value = 2.97 × 10−15, and AUC = 0.789. The risk score generated from the selected multiomics biomarkers worked as an effective indicator for the prediction of PRAD recurrence. This study helps us to understand the etiology and pathways of PRAD and further benefits both patients and physicians with potential prognostic biomarkers when making clinical decisions after surgical treatment.
Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of experimental phosphorylation sites that has been ...identified by mass spectrometry-based proteomics, the desire to explore the networks of protein kinases and substrates is motivated. Manning et al. have identified 518 human kinase genes, which provide a starting point for comprehensive analysis of protein phosphorylation networks. In this study, a knowledgebase is developed to integrate experimentally verified protein phosphorylation data and protein-protein interaction data for constructing the protein kinase-substrate phosphorylation networks in human. A total of 21 110 experimental verified phosphorylation sites within 5092 human proteins are collected. However, only 4138 phosphorylation sites (~20%) have the annotation of catalytic kinases from public domain. In order to fully investigate how protein kinases regulate the intracellular processes, a published kinase-specific phosphorylation site prediction tool, named KinasePhos is incorporated for assigning the potential kinase. The web-based system, RegPhos, can let users input a group of human proteins; consequently, the phosphorylation network associated with the protein subcellular localization can be explored. Additionally, time-coursed microarray expression data is subsequently used to represent the degree of similarity in the expression profiles of network members. A case study demonstrates that the proposed scheme not only identify the correct network of insulin signaling but also detect a novel signaling pathway that may cross-talk with insulin signaling network. This effective system is now freely available at http://RegPhos.mbc.nctu.edu.tw.
Background and Purpose
Benzimidazoles have attracted much attention over the last few decades due to their broad-spectrum pharmacological properties. Increasing evidence is showing the potential use ...of benzimidazoles as anti-angiogenic agents, although the mechanisms that impact angiogenesis remain to be fully defined. In this study, we aim to investigate the anti-angiogenic mechanisms of MFB, a novel 2-aminobenzimidazole derivative, to develop a novel angiogenesis inhibitor.
Experimental Approach
MTT, BrdU, migration and invasion assays, and immunoblotting were employed to examine MFB’s effects on vascular endothelial growth factor (VEGF)-induced endothelial cell proliferation, migration, invasion, as well as signaling molecules activation. The anti-angiogenic effects of MFB were analyzed by tube formation, aorta ring sprouting, and matrigel plug assays. We also used a mouse model of lung metastasis to determine the MFB’s anti-metastatic effects.
Key Results
MFB suppressed cell proliferation, migration, invasion, and endothelial tube formation of VEGF-A-stimulated human umbilical vascular endothelial cells (HUVECs) or VEGF-C-stimulated lymphatic endothelial cells (LECs). MFB suppressed VEGF-A and VEGF-C signaling in HUVECs or LECs. In addition, MFB reduced VEGF-A- or tumor cells-induced neovascularization
in vivo.
MFB also diminished B16F10 melanoma lung metastasis. The molecular docking results further showed that MFB may bind to VEGFR-2 rather than VEGF-A with high affinity.
Conclusions and Implications
These observations indicated that MFB may target VEGF/VEGFR signaling to suppress angiogenesis and lymphangiogenesis. It also supports the role of MFB as a potential lead in developing novel agents for the treatment of angiogenesis- or lymphangiogenesis-associated diseases and cancer.
Studies over the last few years have identified protein methylation on histones and other proteins that are involved in the regulation of gene transcription. Several works have developed approaches ...to identify computationally the potential methylation sites on lysine and arginine. Studies of protein tertiary structure have demonstrated that the sites of protein methylation are preferentially in regions that are easily accessible. However, previous studies have not taken into account the solvent-accessible surface area (ASA) that surrounds the methylation sites. This work presents a method named MASA that combines the support vector machine with the sequence and structural characteristics of proteins to identify methylation sites on lysine, arginine, glutamate, and asparagine. Since most experimental methylation sites are not associated with corresponding protein tertiary structures in the Protein Data Bank, the effective solvent-accessible prediction tools have been adopted to determine the potential ASA values of amino acids in proteins. Evaluation of predictive performance by cross-validation indicates that the ASA values around the methylation sites can improve the accuracy of prediction. Additionally, an independent test reveals that the prediction accuracies for methylated lysine and arginine are 80.8 and 85.0%, respectively. Finally, the proposed method is implemented as an effective system for identifying protein methylation sites. The developed web server is freely available at http://MASA.mbc.nctu.edu.tw/.