Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein ...function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested.
We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods.
To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.
Retrosynthetic route planning can be considered a rule-based reasoning procedure. The possibilities for each transformation are generated based on collected reaction rules, and then potential ...reaction routes are recommended by various optimization algorithms. Although there has been much progress in computer-assisted retrosynthetic route planning and reaction prediction, fully data-driven automatic retrosynthetic route planning remains challenging. Here we present a template-free approach that is independent of reaction templates, rules, or atom mapping, to implement automatic retrosynthetic route planning. We treated each reaction prediction task as a data-driven sequence-to-sequence problem using the multi-head attention-based Transformer architecture, which has demonstrated power in machine translation tasks. Using reactions from the United States patent literature, our end-to-end models naturally incorporate the global chemical environments of molecules and achieve remarkable performance in top-1 predictive accuracy (63.0%, with the reaction class provided) and top-1 molecular validity (99.6%) in one-step retrosynthetic tasks. Inspired by the success rate of the one-step reaction prediction, we further carried out iterative, multi-step retrosynthetic route planning for four case products, which was successful. We then constructed an automatic data-driven end-to-end retrosynthetic route planning system (AutoSynRoute) using Monte Carlo tree search with a heuristic scoring function. AutoSynRoute successfully reproduced published synthesis routes for the four case products. The end-to-end model for reaction task prediction can be easily extended to larger or customer-requested reaction databases. Our study presents an important step in realizing automatic retrosynthetic route planning.
Retrosynthetic pathway planning using a template-free model coupled with heuristic Monte Carlo tree search.
D-3-phosphoglycerate dehydrogenase (PGDH) from Escherichia coli catalyzes the first critical step in serine biosynthesis, and can be allosterically inhibited by serine. In a previous study, we ...developed a computational method for allosteric site prediction using a coarse-grained two-state Gō Model and perturbation. Two potential allosteric sites were predicted for E. coli PGDH, one close to the active site and the nucleotide binding site (Site I) and the other near the regulatory domain (Site II). In the present study, we discovered allosteric inhibitors and activators based on site I, using a high-throughput virtual screen, and followed by using surface plasmon resonance (SPR) to eliminate false positives. Compounds 1 and 2 demonstrated a low-concentration activation and high-concentration inhibition phenomenon, with IC50 values of 34.8 and 58.0 µM in enzymatic bioassays, respectively, comparable to that of the endogenous allosteric effector, L-serine. For its activation activity, compound 2 exhibited an AC50 value of 34.7 nM. The novel allosteric site discovered in PGDH was L-serine- and substrate-independent. Enzyme kinetics studies showed that these compounds influenced Km, kcat, and kcat/Km. We have also performed structure-activity relationship studies to discover high potency allosteric effectors. Compound 2-2, an analog of compound 2, showed the best in vitro activity with an IC50 of 22.3 µM. Compounds targeting this site can be used as new chemical probes to study metabolic regulation in E. coli. Our study not only identified a novel allosteric site and effectors for PGDH, but also provided a general strategy for designing new regulators for metabolic enzymes.
Adverse side effects of drug–drug interactions induced by human cytochrome P450 (CYP450) inhibition is an important consideration in drug discovery. It is highly desirable to develop computational ...models that can predict the inhibitive effect of a compound against a specific CYP450 isoform. In this study, we developed a multitask model for concurrent inhibition prediction of five major CYP450 isoforms, namely, 1A2, 2C9, 2C19, 2D6, and 3A4. The model was built by training a multitask autoencoder deep neural network (DNN) on a large dataset containing more than 13 000 compounds, extracted from the PubChem BioAssay Database. We demonstrate that the multitask model gave better prediction results than that of single-task models, previous reported classifiers, and traditional machine learning methods on an average of five prediction tasks. Our multitask DNN model gave average prediction accuracies of 86.4% for the 10-fold cross-validation and 88.7% for the external test datasets. In addition, we built linear regression models to quantify how the other tasks contributed to the prediction difference of a given task between single-task and multitask models, and we explained under what conditions the multitask model will outperform the single-task model, which suggested how to use multitask DNN models more effectively. We applied sensitivity analysis to extract useful knowledge about CYP450 inhibition, which may shed light on the structural features of these isoforms and give hints about how to avoid side effects during drug development. Our models are freely available at http://repharma.pku.edu.cn/deepcyp/home.php or http://www.pkumdl.cn/deepcyp/home.php.
•Intrinscially disordered proteins (IDPs) are abundant and involved in many diseases.•Advances and challenges in drug design targeting IDPs are discussed.•Known IDP inhibitors are generally more ...hydrophobic, aromatic, and with more rings.•Strategies for IDP drug design are proposed.•IDPs offer enormous potential as druggable targets.
Intrinsically disordered proteins or intrinsically disordered regions (IDPs or IDRs) are those that do not fold into defined tertiary structures under physiological conditions. Given their prevalence in various diseases, IDPs are attractive therapeutic targets. However, because of the dynamic nature of the IDP structure, conventional structure-based drug design methods cannot be directly applied. Thanks to recent progress in understanding the mechanisms underlying IDP and ligand interactions, computational strategies for IDP-targeted rational drug discovery are emerging. Here, we summarize recent developments in computational IDP drug design strategies and their successful applications, analyze the typical properties of reported IDP-binding compounds (iIDPs), and discuss the major challenges ahead as well as possible solutions.
The liquid-liquid phase separation (LLPS) of biomolecules in cell underpins the formation of membraneless organelles, which are the condensates of protein, nucleic acid, or both, and play critical ...roles in cellular function. Dysregulation of LLPS is implicated in a number of diseases. Although the LLPS of biomolecules has been investigated intensively in recent years, the knowledge of the prevalence and distribution of phase separation proteins (PSPs) is still lag behind. Development of computational methods to predict PSPs is therefore of great importance for comprehensive understanding of the biological function of LLPS.
Based on the PSPs collected in LLPSDB, we developed a sequence-based prediction tool for LLPS proteins (PSPredictor), which is an attempt at general purpose of PSP prediction that does not depend on specific protein types. Our method combines the componential and sequential information during the protein embedding stage, and, adopts the machine learning algorithm for final predicting. The proposed method achieves a tenfold cross-validation accuracy of 94.71%, and outperforms previously reported PSPs prediction tools. For further applications, we built a user-friendly PSPredictor web server ( http://www.pkumdl.cn/PSPredictor ), which is accessible for prediction of potential PSPs.
PSPredictor could identifie novel scaffold proteins for stress granules and predict PSPs candidates in the human genome for further study. For further applications, we built a user-friendly PSPredictor web server ( http://www.pkumdl.cn/PSPredictor ), which provides valuable information for potential PSPs recognition.
Bacteria use chemotaxis signaling pathways to sense environmental changes. Escherichia coli chemotaxis system represents an ideal model that illustrates fundamental principles of biological signaling ...processes. Chemoreceptors are crucial signaling proteins that mediate taxis toward a wide range of chemoeffectors. Recently, in deep study of the biochemical and structural features of chemoreceptors, the organization of higher-order clusters in native cells, and the signal transduction mechanisms related to the on–off signal output provides us with general insights to understand how chemotaxis performs high sensitivity, precise adaptation, signal amplification, and wide dynamic range. Along with the increasing knowledge, bacterial chemoreceptors can be engineered to sense novel chemoeffectors, which has extensive applications in therapeutics and industry. Here we mainly review recent advances in the E. coli chemotaxis system involving structure and organization of chemoreceptors, discovery, design, and characterization of chemoeffectors, and signal recognition and transduction mechanisms. Possible strategies for changing the specificity of bacterial chemoreceptors to sense novel chemoeffectors are also discussed.
COVID-19 has become a global pandemic and there is an urgent call for developing drugs against the virus (SARS-CoV-2). The 3C-like protease (3CL
pro
) of SARS-CoV-2 is a preferred target for broad ...spectrum anti-coronavirus drug discovery. We studied the anti-SARS-CoV-2 activity of S. baicalensis and its ingredients. We found that the ethanol extract of S. baicalensis and its major component, baicalein, inhibit SARS-CoV-2 3CL
pro
activity in vitro with IC
50
's of 8.52 µg/ml and 0.39 µM, respectively. Both of them inhibit the replication of SARS-CoV-2 in Vero cells with EC
50
's of 0.74 µg/ml and 2.9 µM, respectively. While baicalein is mainly active at the viral post-entry stage, the ethanol extract also inhibits viral entry. We further identified four baicalein analogues from other herbs that inhibit SARS-CoV-2 3CL
pro
activity at µM concentration. All the active compounds and the S. baicalensis extract also inhibit the SARS-CoV 3CL
pro
, demonstrating their potential as broad-spectrum anti-coronavirus drugs.
Assessing whether a protein structure is a good target or not before actually doing structure-based drug design on it is an important step to speed up the ligand discovery process. This is known as ...the "druggability" or "ligandability" assessment problem that has attracted increasing interest in recent years. The assessment typically includes the detection of ligand-binding sites on the protein surface and the prediction of their abilities to bind drug-like small molecules. A brief summary of the established methods of binding sites detection and druggability(ligandability) prediction, as well as a detailed description of the CAVITY approach developed in the authors' group was given. CAVITY showed good performance on ligand-binding site detection, and was successfully used to predict both the ligandabilities and druggabilities of the detected binding sites.
Inflammation and other common disorders including diabetes, cardiovascular disease, and cancer are often the result of several molecular abnormalities and are not likely to be resolved by a ...traditional single-target drug discovery approach. Though inflammation is a normal bodily reaction, uncontrolled and misdirected inflammation can cause inflammatory diseases such as rheumatoid arthritis and asthma. Nonsteroidal anti-inflammatory drugs including aspirin, ibuprofen, naproxen, or celecoxib are commonly used to relieve aches and pains, but often these drugs have undesirable and sometimes even fatal side effects. To facilitate safer and more effective anti-inflammatory drug discovery, a balanced treatment strategy should be developed at the biological network level. In this Account, we focus on our recent progress in modeling the inflammation-related arachidonic acid (AA) metabolic network and subsequent multiple drug design. We first constructed a mathematical model of inflammation based on experimental data and then applied the model to simulate the effects of commonly used anti-inflammatory drugs. Our results indicated that the model correctly reproduced the established bleeding and cardiovascular side effects. Multitarget optimal intervention (MTOI), a Monte Carlo simulated annealing based computational scheme, was then developed to identify key targets and optimal solutions for controlling inflammation. A number of optimal multitarget strategies were discovered that were both effective and safe and had minimal associated side effects. Experimental studies were performed to evaluate these multitarget control solutions further using different combinations of inhibitors to perturb the network. Consequently, simultaneous control of cyclooxygenase-1 and -2 and leukotriene A4 hydrolase, as well as 5-lipoxygenase and prostaglandin E2 synthase were found to be among the best solutions. A single compound that can bind multiple targets presents advantages including low risk of drug–drug interactions and robustness regarding concentration fluctuations. Thus, we developed strategies for multiple-target drug design and successfully discovered several series of multiple-target inhibitors. Optimal solutions for a disease network often involve mild but simultaneous interventions of multiple targets, which is in accord with the philosophy of traditional Chinese medicine (TCM). To this end, our AA network model can aptly explain TCM anti-inflammatory herbs and formulas at the molecular level. We also aimed to identify activators for several enzymes that appeared to have increased activity based on MTOI outcomes. Strategies were then developed to predict potential allosteric sites and to discover enzyme activators based on our hypothesis that combined treatment with the projected activators and inhibitors could balance different AA network pathways, control inflammation, and reduce associated adverse effects. Our work demonstrates that the integration of network modeling and drug discovery can provide novel solutions for disease control, which also calls for new developments in drug design concepts and methodologies. With the rapid accumulation of quantitative data and knowledge of the molecular networks of disease, we can expect an increase in the development and use of quantitative disease models to facilitate efficient and safe drug discovery.