Abstract
Motivation
Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI ...prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.
Results
Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.
Availability and implementation
The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI.
Supplementary information
Supplementary data are available at Bioinformatics online.
Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of ...computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.
Computational approaches for understanding compound-protein interactions (CPIs) can greatly facilitate drug development. Recently, a number of deep-learning-based methods have been proposed to ...predict binding affinities and attempt to capture local interaction sites in compounds and proteins through neural attentions (i.e., neural network architectures that enable the interpretation of feature importance). Here, we compiled a benchmark dataset containing the inter-molecular non-covalent interactions for more than 10,000 compound-protein pairs and systematically evaluated the interpretability of neural attentions in existing models. We also developed a multi-objective neural network, called MONN, to predict both non-covalent interactions and binding affinities between compounds and proteins. Comprehensive evaluation demonstrated that MONN can successfully predict the non-covalent interactions between compounds and proteins that cannot be effectively captured by neural attentions in previous prediction methods. Moreover, MONN outperforms other state-of-the-art methods in predicting binding affinities. Source code for MONN is freely available for download at https://github.com/lishuya17/MONN.
Display omitted
•MONN models compound-protein interactions from structure-free information•MONN predicts both inter-molecular non-covalent interactions and binding affinities•MONN outperforms other methods, on large datasets with or without atomic structures•Predictions of MONN can be validated by known chemical rules
Identifying compound-protein interactions is one of the essential challenges in drug discovery. We developed MONN, a multi-objective neural network, which not only accurately predicts the binding affinities but also successfully captures the non-covalent interactions between compounds and proteins. MONN can prove to be a useful tool in exploring compound-protein interactions.
The global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) requires an urgent need to find effective therapeutics for the treatment of coronavirus disease 2019 (COVID-19). In ...this study, we developed an integrative drug repositioning framework, which fully takes advantage of machine learning and statistical analysis approaches to systematically integrate and mine large-scale knowledge graph, literature and transcriptome data to discover the potential drug candidates against SARS-CoV-2. Our in silico screening followed by wet-lab validation indicated that a poly-ADP-ribose polymerase 1 (PARP1) inhibitor, CVL218, currently in Phase I clinical trial, may be repurposed to treat COVID-19. Our in vitro assays revealed that CVL218 can exhibit effective inhibitory activity against SARS-CoV-2 replication without obvious cytopathic effect. In addition, we showed that CVL218 can interact with the nucleocapsid (N) protein of SARS-CoV-2 and is able to suppress the LPS-induced production of several inflammatory cytokines that are highly relevant to the prevention of immunopathology induced by SARS-CoV-2 infection.
Knowledge about the relations between biomedical entities (such as drugs and targets) is widely distributed in more than 30 million research articles and consistently plays an important role in the ...development of biomedical science. In this work, we propose a novel machine learning framework, named BERE, for automatically extracting biomedical relations from large-scale literature repositories. BERE uses a hybrid encoding network to better represent each sentence from both semantic and syntactic aspects, and employs a feature aggregation network to make predictions after considering all relevant statements. More importantly, BERE can also be trained without any human annotation via a distant supervision technique. Through extensive tests, BERE has demonstrated promising performance in extracting biomedical relations, and can also find meaningful relations that were not reported in existing databases, thus providing useful hints to guide wet-lab experiments and advance the biological knowledge discovery process.A lot of scientific literature is unstructured, which makes extracting information for biomedical databases difficult. Hong and colleagues show that a distant supervision approach, using latent tree learning and recurrent units, can extract drug–target interactions from literature that were previously unknown.
Accurate identification of compound–protein interactions (CPIs) in silico may deepen our understanding of the underlying mechanisms of drug action and thus remarkably facilitate drug discovery and ...development. Conventional similarity- or docking-based computational methods for predicting CPIs rarely exploit latent features from currently available large-scale unlabeled compound and protein data and often limit their usage to relatively small-scale datasets. In the present study, we propose DeepCPI, a novel general and scalable computational framework that combines effective feature embedding (a technique of representation learning) with powerful deep learning methods to accurately predict CPIs at a large scale. DeepCPI automatically learns the implicit yet expressive low-dimensional features of compounds and proteins from a massive amount of unlabeled data. Evaluations of the measured CPIs in large-scale databases, such as ChEMBL and BindingDB, as well as of the known drug–target interactions from DrugBank, demonstrated the superior predictive performance of DeepCPI. Furthermore, several interactions among small-molecule compounds and three G protein-coupled receptor targets (glucagon-like peptide-1 receptor, glucagon receptor, and vasoactive intestinal peptide receptor) predicted using DeepCPI were experimentally validated. The present study suggests that DeepCPI is a useful and powerful tool for drug discovery and repositioning. The source code of DeepCPI can be downloaded from https://github.com/FangpingWan/DeepCPI.
Deeply understanding the properties (e.g., chemical or biological characteristics) of small molecules plays an essential role in drug development. A large number of molecular property datasets have ...been rapidly accumulated in recent years. However, most of these datasets contain only a limited amount of data, which hinders deep learning methods from making accurate predictions of the corresponding molecular properties. In this work, we propose a transfer learning strategy to alleviate such a data scarcity problem by exploiting the similarity between molecular property prediction tasks. We introduce an effective and interpretable computational framework, named MoTSE (Molecular Tasks Similarity Estimator), to provide an accurate estimation of task similarity. Comprehensive tests demonstrated that the task similarity derived from MoTSE can serve as useful guidance to improve the prediction performance of transfer learning on molecular properties. We also showed that MoTSE can capture the intrinsic relationships between molecular properties and provide meaningful interpretability for the derived similarity.
Display omitted
•MoTSE accurately measures similarity between molecular property prediction tasks•A novel transfer learning strategy to accurately predict molecular properties•An interpretable method to help understand relations between molecular properties
Drugs; Computational chemistry; Bioinformatics; Artificial intelligence
Synthetic lethality (SL), an important type of genetic interaction, can provide useful insight into the target identification process for the development of anticancer therapeutics. Although several ...well-established SL gene pairs have been verified to be conserved in humans, most SL interactions remain cell-line specific. Here, we demonstrated that the cell-line-specific gene expression profiles derived from the shRNA perturbation experiments performed in the LINCS L1000 project can provide useful features for predicting SL interactions in human. In this paper, we developed a semi-supervised neural network-based method called EXP2SL to accurately identify SL interactions from the L1000 gene expression profiles. Through a systematic evaluation on the SL datasets of three different cell lines, we demonstrated that our model achieved better performance than the baseline methods and verified the effectiveness of using the L1000 gene expression features and the semi-supervise training technique in SL prediction.
The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, ...such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.
Abstract
Motivation
Quantitative structure–activity relationship (QSAR) and drug–target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical ...institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery.
Results
We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery.
Availability and implementation
The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git.
Supplementary information
Supplementary data are available at Bioinformatics online.