Many studies now produce parallel data sets from different omics technologies; however, the task of interpreting the acquired data in an integrated fashion is not trivial. This review covers those ...methods that have been used over the past decade to statistically integrate and interpret metabolomics and transcriptomic data sets. It defines four categories of approaches, correlation-based integration, concatenation-based integration, multivariate-based integration and pathway-based integration, into which all existing statistical methods fit. It also explores the choices in study design for generating samples for analysis by these omics technologies and the impact that these technical decisions have on the subsequent data analysis options.
Display omitted
The mechanism of polycation cytotoxicity and the relationship to polymer molecular weight is poorly understood. To gain an insight into this important phenomenon a range of newly ...synthesised uniform (near monodisperse) linear polyethylenimines, commercially available poly(l-lysine)s and two commonly used PEI-based transfectants (broad 22kDa linear and 25kDa branched) were tested for their cytotoxicity against the A549 human lung carcinoma cell line. Cell membrane damage assays (LDH release) and cell viability assays (MTT) showed a strong relationship to dose and polymer molecular weight, and increasing incubation times revealed that even supposedly “non-toxic” low molecular weight polymers still damage cell membranes. The newly proposed mechanism of cell membrane damage is acid catalysed hydrolysis of lipidic phosphoester bonds, which was supported by observations of the hydrolysis of DOPC liposomes.
Protein mutations, especially those which occur in the binding site, play an important role in inter-individual drug response and may alter binding affinity and thus impact the drug’s efficacy and ...side effects. Unfortunately, large-scale experimental screening of ligand-binding against protein variants is still time-consuming and expensive. Alternatively, in silico approaches can play a role in guiding those experiments. Methods ranging from computationally cheaper machine learning (ML) to the more expensive molecular dynamics have been applied to accurately predict the mutation effects. However, these effects have been mostly studied on limited and small datasets, while ideally a large dataset of binding affinity changes due to binding site mutations is needed. In this work, we used the PSnpBind database with six hundred thousand docking experiments to train a machine learning model predicting protein-ligand binding affinity for both wild-type proteins and their variants with a single-point mutation in the binding site. A numerical representation of the protein, binding site, mutation, and ligand information was encoded using 256 features, half of them were manually selected based on domain knowledge. A machine learning approach composed of two regression models is proposed, the first predicting wild-type protein-ligand binding affinity while the second predicting the mutated protein-ligand binding affinity. The best performing models reported an RMSE value within 0.5
-
0.6 kcal/mol
-1
on an independent test set with an R
2
value of 0.87
-
0.90. We report an improvement in the prediction performance compared to several reported models developed for protein-ligand binding affinity prediction. The obtained models can be used as a complementary method in early-stage drug discovery. They can be applied to rapidly obtain a better overview of the ligand binding affinity changes across protein variants carried by people in the population and narrow down the search space where more time-demanding methods can be used to identify potential leads that achieve a better affinity for all protein variants.
Novel platelet and megakaryocyte transcriptome analysis allows prediction of the full or theoretical proteome of a representative human platelet. Here, we integrated the established platelet ...proteomes from six cohorts of healthy subjects, encompassing 5.2 k proteins, with two novel genome-wide transcriptomes (57.8 k mRNAs). For 14.8 k protein-coding transcripts, we assigned the proteins to 21 UniProt-based classes, based on their preferential intracellular localization and presumed function. This classified transcriptome-proteome profile of platelets revealed: (i) Absence of 37.2 k genome-wide transcripts. (ii) High quantitative similarity of platelet and megakaryocyte transcriptomes (R = 0.75) for 14.8 k protein-coding genes, but not for 3.8 k RNA genes or 1.9 k pseudogenes (R = 0.43-0.54), suggesting redistribution of mRNAs upon platelet shedding from megakaryocytes. (iii) Copy numbers of 3.5 k proteins that were restricted in size by the corresponding transcript levels (iv) Near complete coverage of identified proteins in the relevant transcriptome (log2fpkm > 0.20) except for plasma-derived secretory proteins, pointing to adhesion and uptake of such proteins. (v) Underrepresentation in the identified proteome of nuclear-related, membrane and signaling proteins, as well proteins with low-level transcripts. We then constructed a prediction model, based on protein function, transcript level and (peri)nuclear localization, and calculated the achievable proteome at ~ 10 k proteins. Model validation identified 1.0 k additional proteins in the predicted classes. Network and database analysis revealed the presence of 2.4 k proteins with a possible role in thrombosis and hemostasis, and 138 proteins linked to platelet-related disorders. This genome-wide platelet transcriptome and (non)identified proteome database thus provides a scaffold for discovering the roles of unknown platelet proteins in health and disease.
Current clinical strategy for staging and prognostication of colorectal cancer (CRC) relies mainly upon the TNM or Duke system. This clinicopathological stage is a crude prognostic guide because it ...reflects in part the delay in diagnosis in the case of an advanced cancer and gives little insight into the biological characteristics of the tumor. We hypothesized that global metabolic profiling (metabonomics/metabolomics) of colon mucosae would define metabolic signatures that not only discriminate malignant from normal mucosae, but also could distinguish the anatomical and clinicopathological characteristics of CRC. We applied both high-resolution magic angle spinning nuclear magnetic resonance (HR-MAS NMR) and gas chromatography mass spectrometry (GC/MS) to analyze metabolites in biopsied colorectal tumors and their matched normal mucosae obtained from 31 CRC patients. Orthogonal partial least-squares discriminant analysis (OPLS-DA) models generated from metabolic profiles obtained by both analytical approaches could robustly discriminate normal from malignant samples (Q 2 > 0.50, Receiver Operator Characteristic (ROC) AUC >0.95, using 7-fold cross validation). A total of 31 marker metabolites were identified using the two analytical platforms. The majority of these metabolites were associated with expected metabolic perturbations in CRC including elevated tissue hypoxia, glycolysis, nucleotide biosynthesis, lipid metabolism, inflammation and steroid metabolism. OPLS-DA models showed that the metabolite profiles obtained via HR-MAS NMR could further differentiate colon from rectal cancers (Q 2> 0.60, ROC AUC = 1.00, using 7-fold cross validation). These data suggest that metabolic profiling of CRC mucosae could provide new phenotypic biomarkers for CRC management.
A key concept in drug design is how natural variants, especially the ones occurring in the binding site of drug targets, affect the inter-individual drug response and efficacy by altering binding ...affinity. These effects have been studied on very limited and small datasets while, ideally, a large dataset of binding affinity changes due to binding site single-nucleotide polymorphisms (SNPs) is needed for evaluation. However, to the best of our knowledge, such a dataset does not exist. Thus, a reference dataset of ligands binding affinities to proteins with all their reported binding sites’ variants was constructed using a molecular docking approach. Having a large database of protein–ligand complexes covering a wide range of binding pocket mutations and a large small molecules’ landscape is of great importance for several types of studies. For example, developing machine learning algorithms to predict protein–ligand affinity or a SNP effect on it requires an extensive amount of data. In this work, we present PSnpBind: A large database of 0.6 million mutated binding site protein–ligand complexes constructed using a multithreaded virtual screening workflow. It provides a web interface to explore and visualize the protein–ligand complexes and a REST API to programmatically access the different aspects of the database contents. PSnpBind is open source and freely available at
https://psnpbind.org
.
In combination with microspotting, whole-blood microfluidics can provide high-throughput information on multiple platelet functions in thrombus formation. Based on assessment of the inter- and ...intra-subject variability in parameters of microspot-based thrombus formation, we aimed to determine the platelet factors contributing to this variation. Blood samples from 94 genotyped healthy subjects were analyzed for conventional platelet phenotyping: i.e. hematologic parameters, platelet glycoprotein (GP) expression levels and activation markers (24 parameters). Furthermore, platelets were activated by ADP, CRP-XL or TRAP. Parallel samples were investigated for whole-blood thrombus formation (6 microspots, providing 48 parameters of adhesion, aggregation and activation). Microspots triggered platelet activation through GP Ib-V-IX, GPVI, CLEC-2 and integrins. For most thrombus parameters, inter-subject variation was 2-4 times higher than the intra-subject variation. Principal component analyses indicated coherence between the majority of parameters for the GPVI-dependent microspots, partly linked to hematologic parameters, and glycoprotein expression levels. Prediction models identified parameters per microspot that were linked to variation in agonist-induced α
β
activation and secretion. Common sequence variation of
and
, associated with GPVI-induced α
β
activation and secretion, affected parameters of GPVI-and CLEC-2-dependent thrombus formation. Subsequent analysis of blood samples from patients with Glanzmann thrombasthenia or storage pool disease revealed thrombus signatures of aggregation-dependent parameters that were subject-dependent, but not linked to GPVI activity. Taken together, this high-throughput elucidation of thrombus formation revealed patterns of inter-subject differences in platelet function, which were partly related to GPVI-induced activation and common genetic variance linked to GPVI, but also included a distinct platelet aggregation component.
Receptor diffusion plays an essential role in cellular signalling via the plasma membrane microenvironment and receptor interactions, but the regulation is not well understood. To aid in ...understanding of the key determinants of receptor diffusion and signalling, we developed agent-based models (ABMs) to explore the extent of dimerisation of the platelet- and megakaryocyte-specific receptor for collagen glycoprotein VI (GPVI). This approach assessed the importance of glycolipid enriched raft-like domains within the plasma membrane that lower receptor diffusivity. Our model simulations demonstrated that GPVI dimers preferentially concentrate in confined domains and, if diffusivity within domains is decreased relative to outside of domains, dimerisation rates are increased. While an increased amount of confined domains resulted in further dimerisation, merging of domains, which may occur upon membrane rearrangements, was without effect. Modelling of the proportion of the cell membrane which constitutes lipid rafts indicated that dimerisation levels could not be explained by these alone. Crowding of receptors by other membrane proteins was also an important determinant of GPVI dimerisation. Together, these results demonstrate the value of ABM approaches in exploring the interactions on a cell surface, guiding the experimentation for new therapeutic avenues.
The liver is the primary site for the metabolism and detoxification of many compounds, including pharmaceuticals. Consequently, it is also the primary location for many adverse reactions. As the ...liver is not readily accessible for sampling in humans; rodent or cell line models are often used to evaluate potential toxic effects of a novel compound or candidate drug. However, relating the results of animal and in vitro studies to relevant clinical outcomes for the human in vivo situation still proves challenging. In this study, we incorporate principles of transfer learning within a deep artificial neural network allowing us to leverage the relative abundance of rat in vitro and in vivo exposure data from the Open TG-GATEs data set to train a model to predict the expected pattern of human in vivo gene expression following an exposure given measured human in vitro gene expression. We show that domain adaptation has been successfully achieved, with the rat and human in vitro data no longer being separable in the common latent space generated by the network. The network produces physiologically plausible predictions of human in vivo gene expression pattern following an exposure to a previously unseen compound. Moreover, we show the integration of the human in vitro data in the training of the domain adaptation network significantly improves the temporal accuracy of the predicted rat in vivo gene expression pattern following an exposure to a previously unseen compound. In this way, we demonstrate the improvements in prediction accuracy that can be achieved by combining data from distinct domains.