Trypanosoma brucei causes African trypanosomiasis in humans (HAT or African sleeping sickness) and Nagana in cattle. The disease threatens over 60 million people and uncounted numbers of cattle in 36 ...countries of sub-Saharan Africa and has a devastating impact on human health and the economy. On the other hand, Trypanosoma cruzi is responsible in South America for Chagas disease, which can cause acute illness and death, especially in young children. In this context, the discovery of novel drug targets in Trypanosome proteome is a major focus for the scientific community. Recently, many researchers have spent important efforts on the study of protein−protein interactions (PPIs) in pathogen Trypanosome species concluding that the low sequence identities between some parasite proteins and their human host render these PPIs as highly promising drug targets. To the best of our knowledge, there are no general models to predict Unique PPIs in Trypanosome (TPPIs). On the other hand, the 3D structure of an increasing number of Trypanosome proteins is reported in databases. In this regard, the introduction of a new model to predict TPPIs from the 3D structure of proteins involved in PPI is very important. For this purpose, we introduced new protein−protein complex invariants based on the Markov average electrostatic potential ξk(Ri) for amino acids located in different regions (Ri) of i-th protein and placed at a distance k one from each other. We calculated more than 30 different types of parameters for 7866 pairs of proteins (1023 TPPIs and 6823 non-TPPIs) from more than 20 organisms, including parasites and human or cattle hosts. We found a very simple linear model that predicts above 90% of TPPIs and non-TPPIs both in training and independent test subsets using only two parameters. The parameters were dξk(s) = |ξk(s1) − ξk(s2)|, the absolute difference between the ξk(si) values on the surface of the two proteins of the pairs. We also tested nonlinear ANN models for comparison purposes but the linear model gives the best results. We implemented this predictor in the web server named TrypanoPPI freely available to public at http://miaja.tic.udc.es/Bio-AIMS/TrypanoPPI.php. This is the first model that predicts how unique a protein−protein complex in Trypanosome proteome is with respect to other parasites and hosts, opening new opportunities for antitrypanosome drug target discovery.
The study of selective toxicity of carbon nanotubes (CNTs) on mitochondria (CNT-mitotoxicity) is of major interest for future biomedical applications. In the current work, the mitochondrial oxygen ...consumption (E3) is measured under three experimental conditions by exposure to pristine and oxidized CNTs (hydroxylated and carboxylated). Respiratory functional assays showed that the information on the CNT Raman spectroscopy could be useful to predict structural parameters of mitotoxicity induced by CNTs. The in vitro functional assays show that the mitochondrial oxidative phosphorylation by ATP-synthase (or state V3 of respiration) was not perturbed in isolated rat-liver mitochondria. For the first time a star graph (SG) transform of the CNT Raman spectra is proposed in order to obtain the raw information for a nano-QSPR model. Box–Jenkins and perturbation theory operators are used for the SG Shannon entropies. A modified RRegrs methodology is employed to test four regression methods such as multiple linear regression (LM), partial least squares regression (PLS), neural networks regression (NN), and random forest (RF). RF provides the best models to predict the mitochondrial oxygen consumption in the presence of specific CNTs with R 2 of 0.998–0.999 and RMSE of 0.0068–0.0133 (training and test subsets). This work is aimed at demonstrating that the SG transform of Raman spectra is useful to encode CNT information, similarly to the SG transform of the blood proteome spectra in cancer or electroencephalograms in epilepsy and also as a prospective chemoinformatics tool for nanorisk assessment. All data files and R object models are available at https://dx.doi.org/10.6084/m9.figshare.3472349.
ChEMBL biological activities prediction for 1–5-bromofur-2-il-2-bromo-2-nitroethene (G1) is a difficult task for cytokine immunotoxicity. The current study presents experimental results for G1 ...interaction with mouse Th1/Th2 and pro-inflammatory cytokines using a cytometry bead array (CBA). In the in vitro test of CBA, the results show no significant differences between the mean values of the Th1/Th2 cytokines for the samples treated with G1 with respect to the negative control, but there are moderate differences for cytokine values between different periods (24/48 h). The experiments show no significant differences between the mean values of the pro-inflammatory cytokines for the samples treated with G1, regarding the negative control, except for the values of tumor necrosis factor (TNF) and Interleukin (IL6) between the group treated with G1 and the negative control at 48 h. Differences occur for these cytokines in the periods (24/48 h). The study confirmed that the antimicrobial G1 did not alter the Th1/Th2 cytokines concentration in vitro in different periods, but it can alter TNF and IL6. G1 promotes free radicals production and activates damage processes in macrophages culture. In order to predict all ChEMBL activities for drugs in other experimental conditions, a ChEMBL data set was constructed using 25 biological activities, 1366 assays, 2 assay types, 4 assay organisms, 2 organisms, and 12 cytokine targets. Molecular descriptors calculated with Rcpi and 15 machine learning methods were used to find the best model able to predict if a drug could be active or not against a specific cytokine, in specific experimental conditions. The best model is based on 120 selected molecular descriptors and a deep neural network with area under the curve of the receiver operating characteristic of 0.904 and accuracy of 0.832. This model predicted 1384 G1 biological activities against cytokines in all ChEMBL data set experimental conditions.
Background
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression ...parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others.
Results
We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at
https://www.github.com/enanomapper/RRegrs
, by reusing and extending on the caret package.
Conclusion
The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.
Graphical abstract
RRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling
In developing countries, maternal undernutrition is the major intrauterine environmental factor contributing to fetal development and adverse pregnancy outcomes. Maternal nutrition restriction (MNR) ...in gestation has proven to impact overall growth, bone development, and proliferation and metabolism of mesenchymal stem cells in offspring. However, the efficient method for elucidation of fetal bone development performance through maternal bone metabolic biochemical markers remains elusive.
We adapted goats to elucidate fetal bone development state with maternal serum bone metabolic proteins under malnutrition conditions in mid- and late-gestation stages. We used the experimental data to create 72 datasets by mixing different input features such as one-hot encoding of experimental conditions, metabolic original data, experimental-centered features and experimental condition probabilities. Seven Machine Learning methods have been used to predict six fetal bone parameters (weight, length, and diameter of femur/humerus).
The results indicated that MNR influences fetal bone development (femur and humerus) and fetal bone metabolic protein levels (C-terminal telopeptides of collagen I, CTx, in middle-gestation and N-terminal telopeptides of collagen I, NTx, in late-gestation), and maternal bone metabolites (low bone alkaline phosphatase, BALP, in middle-gestation and high BALP in late-gestation). The results show the importance of experimental conditions (ECs) encoding by mixing the information with the serum metabolic data. The best classification models obtained for femur weight (Fw) and length (FI), and humerus weight (Hw) are Support Vector Machines classifiers with the leave-one-out cross-validation accuracy of 1. The rest of the accuracies are 0.98, 0.946 and 0.696 for the diameter of femur (Fd), diameter and length of humerus (Hd, Hl), respectively. With the feature importance analysis, the moving averages mixed ECs are generally more important for the majority of the models. The moving average of parathyroid hormone (PTH) within nutritional conditions (MA-PTH-experim) is important for Fd, Hd and Hl prediction models but its removal for enhancing the Fw, Fl and Hw model performance. Further, using one feature models, it is possible to obtain even more accurate models compared with the feature importance analysis models. In conclusion, the machine learning is an efficient method to confirm the important role of PTH and BALP mixed with nutritional conditions for fetal bone growth performance of goats. All the Python scripts including results and comments are available into an open repository at https://gitlab.com/muntisa/goat-bones-machine-learning.
Predicting drug–protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or ...Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70–91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical–experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC50, EC50, K i, etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.
Display omitted
•An early drug discovery workflow for the IL-17A inflammatory pathway was set up.•Two novel small molecule ligands targeting IL-17A/IL-17RA complex were identified.•Ligands inhibited ...IL-17A-induced IL-8 and CCL20 release in human keratinocytes.•CBG060392 partially inhibited the IL-17A receptor intracellular signalling.
Interleukin 17 (IL-17) is a proinflammatory cytokine that acts as an immune checkpoint for several autoimmune diseases. Therapeutic neutralizing antibodies that target this cytokine have demonstrated clinical efficacy in psoriasis. However, biologics have limitations such as their high cost and their lack of oral bioavailability. Thus, it is necessary to expand the therapeutic options for this IL-17A/IL-17RA pathway, applying novel drug discovery methods to find effective small molecules. In this work, we combined biophysical and cell-based assays with structure-based docking to find novel ligands that target this pathway. First, a virtual screening of our chemical library of 60000 compounds was used to identify 67 potential ligands of IL-17A and IL-17RA. We developed a biophysical label-free binding assay to determine interactions with the extracellular domain of IL-17RA. Two molecules (CBG040591 and CBG060392) with quinazolinone and pyrrolidinedione chemical scaffolds, respectively, were confirmed as ligands of IL-17RA with micromolar affinity. The anti-inflammatory activity of these ligands as cytokine-release inhibitors was evaluated in human keratinocytes. Both ligands inhibited the release of chemokines mediated by IL-17A, with an IC50 of 20.9 ± 12.6 μM and 23.6 ± 11.8 μM for CCL20 and an IC50 of 26.7 ± 13.1 μM and 45.3 ± 13.0 μM for CXCL8. Hence, they blocked IL-17A proinflammatory activity, which is consistent with the inhibition of the signalling of the IL-17A receptor by ligand CBG060392. Therefore, we identified two novel immunopharmacological ligands targeting the IL-17A/IL-17RA pathway with antiinflammatory efficacy that can be promising tools for a drug discovery program for psoriasis.
The current molecular docking study provided the Free Energy of Binding (FEB) for the interaction (nanotoxicity) between VDAC mitochondrial channels of three species (VDAC1-Mus musculus, VDAC1-Homo ...sapiens, VDAC2-Danio rerio) with SWCNT-H, SWCNT-OH, SWCNT-COOH carbon nanotubes. The general results showed that the FEB values were statistically more negative (p < 0.05) in the following order: (SWCNT-VDAC2-Danio rerio) > (SWCNT-VDAC1-Mus musculus) > (SWCNT-VDAC1-Homo sapiens) > (ATP-VDAC). More negative FEB values for SWCNT-COOH and OH were found in VDAC2-Danio rerio when compared with VDAC1-Mus musculus and VDAC1-Homo sapiens (p < 0.05). In addition, a significant correlation (0.66 > r
> 0.97) was observed between n-Hamada index and VDAC nanotoxicity (or FEB) for the zigzag topologies of SWCNT-COOH and SWCNT-OH. Predictive Nanoparticles-Quantitative-Structure Binding-Relationship models (nano-QSBR) for strong and weak SWCNT-VDAC docking interactions were performed using Perturbation Theory, regression and classification models. Thus, 405 SWCNT-VDAC interactions were predicted using a nano-PT-QSBR classifications model with high accuracy, specificity, and sensitivity (73-98%) in training and validation series, and a maximum AUROC value of 0.978. In addition, the best regression model was obtained with Random Forest (R
of 0.833, RMSE of 0.0844), suggesting an excellent potential to predict SWCNT-VDAC channel nanotoxicity. All study data are available at https://doi.org/10.6084/m9.figshare.4802320.v2 .
The management of ruminant growth yield has economic importance. The current work presents a study of the spatiotemporal dynamic expression of Ghrelin and GHR at mRNA levels throughout the ...gastrointestinal tract (GIT) of kid goats under housing and grazing systems. The experiments show that the feeding system and age affected the expression of either Ghrelin or GHR with different mechanisms. Furthermore, the experimental data are used to build new Machine Learning models based on the Perturbation Theory, which can predict the effects of perturbations of Ghrelin and GHR mRNA expression on the growth yield. The models consider eight longitudinal GIT segments (rumen, abomasum, duodenum, jejunum, ileum, cecum, colon and rectum), seven time points (0, 7, 14, 28, 42, 56 and 70 d) and two feeding systems (Supplemental and Grazing feeding) as perturbations from the expected values of the growth yield. The best regression model was obtained using Random Forest, with the coefficient of determination R(2) of 0.781 for the test subset. The current results indicate that the non-linear regression model can accurately predict the growth yield and the key nodes during gastrointestinal development, which is helpful to optimize the feeding management strategies in ruminant production system.
Signaling proteins are an important topic in drug development due to the increased importance of finding fast, accurate and cheap methods to evaluate new molecular targets involved in specific ...diseases. The complexity of the protein structure hinders the direct association of the signaling activity with the molecular structure. Therefore, the proposed solution involves the use of protein star graphs for the peptide sequence information encoding into specific topological indices calculated with S2SNet tool. The Quantitative Structure–Activity Relationship classification model obtained with Machine Learning techniques is able to predict new signaling peptides. The best classification model is the first signaling prediction model, which is based on eleven descriptors and it was obtained using the Support Vector Machines-Recursive Feature Elimination (SVM-RFE) technique with the Laplacian kernel (RFE-LAP) and an AUROC of 0.961. Testing a set of 3114 proteins of unknown function from the PDB database assessed the prediction performance of the model. Important signaling pathways are presented for three UniprotIDs (34 PDBs) with a signaling prediction greater than 98.0%.