A virtual metabolic human model is a valuable complement to experimental biology and clinical studies, because in vivo research involves serious ethical and technical problems. I have proposed a ...multi-organ and multi-scale kinetic model that formulates the reactions of enzymes and transporters with the regulation of hormonal actions at postprandial and postabsorptive states. The computational model consists of 202 ordinary differential equations for metabolites with 217 reaction rates and 1,140 kinetic parameter constants. It is the most comprehensive, largest, and highly predictive model of the whole-body metabolism. Use of the model revealed the mechanisms by which individual disorders, such as steatosis, β cell dysfunction, and insulin resistance, were combined to cause diabetes. The model predicted a glycerol kinase inhibitor to be an effective medicine for type 2 diabetes, which not only decreased hepatic triglyceride but also reduced plasma glucose. The model also enabled us to rationally design combination therapy.
Display omitted
•A standard of virtual metabolic human dynamic models is proposed•It integrates the three scales of molecules, organs, and whole body•It gets insight into pathological mechanisms of type 1 and type 2 diabetes•It enables the computer-aided design of medication treatment for diabetes
Biological Sciences; Endocrinology; Mathematical Biosciences; Metabolomics; Systems Biology
Drug-target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI ...identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several metrics on the DAVIS dataset and first revealed with statistical significance that some weighted sites in the cross-attention weight matrix represent experimental binding sites, thus demonstrating the high interpretability of the results. The program is freely available at
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently, anti‐TB peptides have provided an alternative approach to combat antibiotic tolerance. We have developed an ...effective computational predictor, identification of antitubercular peptides (iAntiTB), by the integration of multiple feature vectors deriving from the amino acid sequences via random forest (RF) and support vector machine (SVM) classifiers. The iAntiTB combines the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor, we prepared the two datasets with different types of negative samples. The iAntiTB achieved area under the ROC curve values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors.
Kinetic modeling is an essential tool in systems biology research, enabling the quantitative analysis of biological systems and predicting their behavior. However, the development of kinetic models ...is a complex and time-consuming process. In this article, we propose a novel approach called KinModGPT, which generates kinetic models directly from natural language text. KinModGPT employs GPT as a natural language interpreter and Tellurium as an SBML generator. We demonstrate the effectiveness of KinModGPT in creating SBML kinetic models from complex natural language descriptions of biochemical reactions. KinModGPT successfully generates valid SBML models from a range of natural language model descriptions of metabolic pathways, protein-protein interaction networks, and heat shock response. This article demonstrates the potential of KinModGPT in kinetic modeling automation.
DNA N
6
-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in ...numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in
Rosa chinensis
and
Fragaria vesca
. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron–ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of
Rosa chinensis
and
Fragaria vesca
, respectively. In the
F. vesca
-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the
R. chinensis
-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at
https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/
.
Key message
The existing prediction models are not suitable to identify 6mA in the Rosaceae genome because the existing algorithms are species-specific. Thus, a novel predictor is desired to be established to identify 6mA sites in the Rosaceae genome. To the best of our knowledge, we first propose a computation model named i6mA-Fuse (Identification of N6-MethylAdenine sites by Fusing multiple feature representation) to predict 6mA sites from the Rosaceae genomes, especially in
Rosa chinensis
and
Fragaria vesca
.
Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. ...Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly ...specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of ...N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.
A proinflammatory peptide (PIP) is a type of signaling molecules that are secreted from immune cells, which contributes to the first line of defense against invading pathogens. Numerous experiments ...have shown that PIPs play an important role in human physiology such as vaccines and immunotherapeutic drugs. Considering high-throughput laboratory methods that are time consuming and costly, effective computational methods are great demand to timely and accurately identify PIPs. Thus, in this study, we proposed a computational model in conjunction with a multiple feature representation, called ProIn-Fuse, to improve the performance of PIPs identification. Specifically, a feature representation learning model was utilized to generate the probabilistic scores by using the random forest models employing eight sequence encoding schemes. Finally, the ProIn-Fuse was constructed by linearly combining the resultant eight probabilistic scores. Evaluated through independent test, the ProIn-Fuse yielded an accuracy of 0.746, which was 10% higher than those obtained by the state-of-the-art PIP predictors. The proposed ProIn-Fuse can facilitate faster and broader applications of PIPs in drug design and development. The web server, datasets and online instruction are freely accessible at
http://kurata14.bio.kyutech.ac.jp/ProIn-Fuse
.
Many kinetic models of Escherichia coli central metabolism have been built, but few models accurately reproduced the dynamic behaviors of wild type and multiple genetic mutants. In 2016, our latest ...kinetic model improved problems of existing models to reproduce the cell growth and glucose uptake of wild type, ΔpykA:pykF and Δpgi in a batch culture, while it overestimated the glucose uptake and cell growth rates of Δppc and hardly captured the typical characteristics of the glyoxylate and TCA cycle fluxes for Δpgi and Δppc. Such discrepancies between the simulated and experimental data suggested biological complexity. In this study, we overcame these problems by assuming critical mechanisms regarding the OAA-regulated isocitrate dehydrogenase activity, aceBAK gene regulation and growth suppression. The present model accurately predicts the extracellular and intracellular dynamics of wild type and many gene knockout mutants in batch and continuous cultures. It is now the most accurate, detailed kinetic model of E. coli central carbon metabolism and will contribute to advances in mathematical modeling of cell factories.