Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. ...Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is ...important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are labor intensive, time consuming, and expensive. To date, computational models for the prediction and analysis of umami peptides as a function of sequence information have not been developed yet. In this study, we have proposed the first sequence-based predictor named iUmami-SCM using primary sequence information for the identification and characterization of umami peptides. iUmami-SCM utilized a newly developed scoring card method (SCM) in conjunction with the propensity scores of amino acids and dipeptide. Our predictor demonstrated excellent prediction performance ability for predicting umami peptides as well as outperforming other commonly used machine learning classifiers. Particularly, iUmami-SCM afforded the highest accuracy and Matthews correlation coefficient of 0.865 and 0.679, respectively, on an independent data set. Furthermore, the analysis of SCM-derived propensity scores was performed so as to provide a more in-depth understanding and knowledge of biophysical and biochemical properties of umami intensities of peptides. To develop a convenient bioinformatics tool, the best model is deployed as a web server that is made publicly available at http://camt.pythonanywhere.com/iUmami-SCM. The iUmami-SCM, as presented herein, serves as a powerful computational technique for large-scale umami peptide identification as well as facilitating the interpretation of umami peptides.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
The inhibition of dipeptidyl peptidase IV (DPP-IV, E.C.3.4.14.5) is well recognized as a new avenue for the treatment of Type 2 diabetes (T2D). Until now, peptide-like DDP-IV inhibitors have been ...shown to normalize the blood glucose concentration in T2D subjects. To the best of our knowledge, there is yet no computational model for predicting and analyzing DPP-IV inhibitory peptides using sequence information. In this study, we present for the first time a simple and easily interpretable sequence-based predictor using the scoring card method (SCM) for modeling the bioactivity of DPP-IV inhibitory peptides (iDPPIV-SCM). Particularly, the iDPPIV-SCM was developed by employing the SCM method together with the propensity scores of amino acids. Rigorous independent test results demonstrated that the proposed iDPPIV-SCM was found to be superior to those of well-known machine learning (ML) classifiers (e.g., k-nearest neighbor, logistic regression, and decision tree) with demonstrated improvements of 2–11, 4–22, and 7–10% for accuracy, MCC, and AUC, respectively, while also achieving comparable results to that of the support vector machine. Furthermore, the analysis of estimated propensity scores of amino acids as derived from the iDPPIV-SCM was performed so as to provide a more in-depth understanding on the molecular basis for enhancing the DPP-IV inhibitory potency. Taken together, these results revealed that iDPPIV-SCM was superior to those of other well-known ML classifiers owing to its simplicity, interpretability, and validity. For the convenience of biologists, the predictive model is deployed as a publicly accessible web server at http://camt.pythonanywhere.com/iDPPIV-SCM. It is anticipated that iDPPIV-SCM can serve as an important tool for the rapid screening of promising DPP-IV inhibitory peptides prior to their synthesis.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Abstract
Motivation
Therapeutic peptides failing at clinical trials could be attributed to their toxicity profiles like hemolytic activity, which hamper further progress of peptides as drug ...candidates. The accurate prediction of hemolytic peptides (HLPs) and its activity from the given peptides is one of the challenging tasks in immunoinformatics, which is essential for drug development and basic research. Although there are a few computational methods that have been proposed for this aspect, none of them are able to identify HLPs and their activities simultaneously.
Results
In this study, we proposed a two-layer prediction framework, called HLPpred-Fuse, that can accurately and automatically predict both hemolytic peptides (HLPs or non-HLPs) as well as HLPs activity (high and low). More specifically, feature representation learning scheme was utilized to generate 54 probabilistic features by integrating six different machine learning classifiers and nine different sequence-based encodings. Consequently, the 54 probabilistic features were fused to provide sufficiently converged sequence information which was used as an input to extremely randomized tree for the development of two final prediction models which independently identify HLP and its activity. Performance comparisons over empirical cross-validation analysis, independent test and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity.
Availability and implementation
For the convenience of experimental scientists, a web-based tool has been established at http://thegleelab.org/HLPpred-Fuse.
Contact
glee@ajou.ac.kr or watshara.sho@mahidol.ac.th or bala@ajou.ac.kr
Supplementary information
Supplementary data are available at Bioinformatics online.
DNA N
6
-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in ...numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in
Rosa chinensis
and
Fragaria vesca
. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron–ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of
Rosa chinensis
and
Fragaria vesca
, respectively. In the
F. vesca
-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the
R. chinensis
-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at
https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/
.
Key message
The existing prediction models are not suitable to identify 6mA in the Rosaceae genome because the existing algorithms are species-specific. Thus, a novel predictor is desired to be established to identify 6mA sites in the Rosaceae genome. To the best of our knowledge, we first propose a computation model named i6mA-Fuse (Identification of N6-MethylAdenine sites by Fusing multiple feature representation) to predict 6mA sites from the Rosaceae genomes, especially in
Rosa chinensis
and
Fragaria vesca
.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
As it is the seventh most-spoken language and fifth most-spoken native language in the world, the domain of Bengali handwritten character recognition has fascinated researchers for decades. Although ...other popular languages i.e., English, Chinese, Hindi, Spanish, etc. have received many contributions in the area of handwritten character recognition, Bengali has not received many noteworthy contributions in this domain because of the complex curvatures and similar writing fashions of Bengali characters. Previously, studies were conducted by using different approaches based on traditional learning, and deep learning. In this research, we proposed a low-cost novel convolutional neural network architecture for the recognition of Bengali characters with only 2.24 to 2.43 million parameters based on the number of output classes. We considered 8 different formations of CMATERdb datasets based on previous studies for the training phase. With experimental analysis, we showed that our proposed system outperformed previous works by a noteworthy margin for all 8 datasets. Moreover, we tested our trained models on other available Bengali characters datasets such as Ekush, BanglaLekha, and NumtaDB datasets. Our proposed architecture achieved 96–99% overall accuracies for these datasets as well. We believe our contributions will be beneficial for developing an automated high-performance recognition tool for Bengali handwritten characters.
In general, hydrolyzed proteins, plant-derived alkaloids and toxins displays unpleasant bitter taste. Thus, the perception of bitter taste plays a crucial role in protecting animals from poisonous ...plants and environmental toxins. Therapeutic peptides have attracted great attention as a new drug class. The successful identification and characterization of bitter peptides are essential for drug development and nutritional research. Owing to the large volume of peptides generated in the post-genomic era, there is an urgent need to develop computational methods for rapidly and effectively discriminating bitter peptides from non-bitter peptides. To the best of our knowledge, there is yet no computational model for predicting and analyzing bitter peptides using sequence information. In this study, we present for the first time a computational model called the iBitter-SCM that can predict the bitterness of peptides directly from their amino acid sequence without any dependence on their functional domain or structural information. iBitter-SCM is a simple and effective method that was built using the scoring card method (SCM) with estimated propensity scores of amino acids and dipeptides. Our benchmarking results demonstrated that iBitter-SCM achieved an accuracy and Matthews coefficient correlation of 84.38% and 0.688, respectively, on the independent dataset. Rigorous independent test indicated that iBitter-SCM was superior to those of other widely used machine-learning classifiers (e.g. k-nearest neighbor, naive Bayes, decision tree and random forest) owing to its simplicity, interpretability and implementation. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide a better understanding of the biophysical and biochemical properties of bitter peptides. For the convenience of experimental scientists, a web server is provided publicly at http://camt.pythonanywhere.com/iBitter-SCM. It is anticipated that iBitter-SCM can serve as an important tool to facilitate the high-throughput prediction and de novo design of bitter peptides.
•This study presents for the first time a computational model that can predict peptide sequences with or without bitter taste.•iBitter-SCM is a simple yet effective method built by using SCM method with estimated propensity scores of dipeptides.•iBitter-SCM was superior to widely used classifiers, considering its simplicity, interpretability, and implementation.•The propensity scores of amino acids provide a better understanding of the physicochemical properties of bitter peptides•The iBitter-SCM web server was established and made freely available online at http://camt.pythonanywhere.com/iBitter-SCM.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of ...N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ...ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP) is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in 10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods ...have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK