Proteins are effector molecules that mediate the functions of genes
and modulate comorbidities
, behaviors and drug treatments
. They represent an enormous potential resource for personalized, ...systemic and data-driven diagnosis, prevention, monitoring and treatment. However, the concept of using plasma proteins for individualized health assessment across many health conditions simultaneously has not been tested. Here, we show that plasma protein expression patterns strongly encode for multiple different health states, future disease risks and lifestyle behaviors. We developed and validated protein-phenotype models for 11 different health indicators: liver fat, kidney filtration, percentage body fat, visceral fat mass, lean body mass, cardiopulmonary fitness, physical activity, alcohol consumption, cigarette smoking, diabetes risk and primary cardiovascular event risk. The analyses were prospectively planned, documented and executed at scale on archived samples and clinical data, with a total of ~85 million protein measurements in 16,894 participants. Our proof-of-concept study demonstrates that protein expression patterns reliably encode for many different health issues, and that large-scale protein scanning
coupled with machine learning is viable for the development and future simultaneous delivery of multiple measures of health. We anticipate that, with further validation and the addition of more protein-phenotype models, this approach could enable a single-source, individualized so-called liquid health check.
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into ...clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.
Significance Duchenne muscular dystrophy (DMD) is a rare and devastating muscle disease caused by mutations in the X-linked DMD gene (which encodes the dystrophin protein). Serum biomarkers hold ...significant potential as objective phenotypic measures of DMD disease state, as well as potential measures of pharmacological effects of and response to therapeutic interventions. Here we describe a proteomics approach to determine serum levels of 1,125 proteins in 93 DMD patients and 45 controls. The study identified 44 biomarkers that differed significantly between patients and controls. These data are being made available to DMD researchers and clinicians to accelerate the search for new diagnostic, prognostic, and therapeutic approaches.
Serum biomarkers in Duchenne muscular dystrophy (DMD) may provide deeper insights into disease pathogenesis, suggest new therapeutic approaches, serve as acute read-outs of drug effects, and be useful as surrogate outcome measures to predict later clinical benefit. In this study a large-scale biomarker discovery was performed on serum samples from patients with DMD and age-matched healthy volunteers using a modified aptamer-based proteomics technology. Levels of 1,125 proteins were quantified in serum samples from two independent DMD cohorts: cohort 1 (The Parent Project Muscular Dystrophy–Cincinnati Children’s Hospital Medical Center), 42 patients with DMD and 28 age-matched normal volunteers; and cohort 2 (The Cooperative International Neuromuscular Research Group, Duchenne Natural History Study), 51 patients with DMD and 17 age-matched normal volunteers. Forty-four proteins showed significant differences that were consistent in both cohorts when comparing DMD patients and healthy volunteers at a 1% false-discovery rate, a large number of significant protein changes for such a small study. These biomarkers can be classified by known cellular processes and by age-dependent changes in protein concentration. Our findings demonstrate both the utility of this unbiased biomarker discovery approach and suggest potential new diagnostic and therapeutic avenues for ameliorating the burden of DMD and, we hope, other rare and devastating diseases.
Nature Communications 8: Article number: 14357 (2017); Published 27 February 2017; Updated 11 April 2017 The original version of the Supplementary Information attached to this Article did not include ...Supplementary Note 1 The HTML has now been updated to include a corrected version of the Supplementary Information.
Cytochrome P450 enzymes are the predominant mediators of phase I metabolism of exogenous small molecules. As a result of their extensive role in metabolism of xenobiotics, drug compounds, and ...endogenous compounds, as well as their wide tissue distribution, significant drug discovery resources are spent to avoid interacting with this class of enzymes. Here we review historical and recent in silico modeling of 7 cytochrome P450 enzymes of particular interest, specifically CYP1A2, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, and CYP3A4. For each we provide a brief biological background including known inhibitors, substrates, and inducers, as well as details of computational modeling efforts and advances in structural biology. We also provide similar details for 3 nuclear receptors known to regulate gene expression of these enzyme families.
Self-organizing maps (SOMs) are a type of artificial neural network that through training can produce simplified representations of large, high dimensional data sets. These representations are ...typically used for visualization, classification, and clustering and have been successfully applied to a variety of problems in the pharmaceutical and bioinformatics domains. SOMs in these domains have generally been restricted to static sets of nodes connected in either a grid or hexagonal connectivity and planar or toroidal topologies. We investigate the impact of connectivity and topology on SOM performance, and experiments were performed on fixed and growing SOMs. Three synthetic and two relevant data sets from the chemistry domain were used for evaluation, and performance was assessed on the basis of topological and quantization errors after equivalent training periods. Although we found that all SOMs were roughly comparable at quantizing a data space, there was wide variation in the ability to capture its underlying structure, and growing SOMs consistently outperformed their static counterparts in regards to topological errors. Additionally, one growing SOM, the Neural Gas, was found to be far more capable of capturing details of a target data space, finding lower dimensional relationships hidden within higher dimensional representations.
Abstract
Background
The emergence and spread of Plasmodium falciparum parasites that lack HRP2/3 proteins and the resulting decreased utility of HRP2-based malaria rapid diagnostic tests (RDTs) ...prompted the World Health Organization and other global health stakeholders to prioritize the discovery of novel diagnostic biomarkers for malaria.
Methods
To address this pressing need, we adopted a dual, systematic approach by conducting a systematic review of the literature for publications on diagnostic biomarkers for uncomplicated malaria and a systematic in silico analysis of P. falciparum proteomics data for Plasmodium proteins with favorable diagnostic features.
Results
Our complementary analyses led us to 2 novel malaria diagnostic biomarkers compatible for use in an RDT format: glyceraldehyde 3-phosphate dehydrogenase and dihydrofolate reductase-thymidylate synthase.
Conclusions
Overall, our results pave the way for the development of next-generation malaria RDTs based on new antigens by identifying 2 lead candidates with favorable diagnostic features and partially de-risked product development prospects.
The World Health Organization called for the identification of novel biomarkers that can fill the diagnostic gap in settings where hrp2/3 deletions are common. By adopting a dual approach, we identified 2 candidates, glyceraldehyde 3-phosphate dehydrogenase and dihydrofolate reductase-thymidylate synthase, with favorable diagnostic features.
Decision trees have been used extensively in cheminformatics for modeling various biochemical endpoints including receptor−ligand binding, ADME properties, environmental impact, and toxicity. The ...traditional approach to inducing decision trees based upon a given training set of data involves recursive partitioning which selects partitioning variables and their values in a greedy manner to optimize a given measure of purity. This methodology has numerous benefits including classifier interpretability and the capability of modeling nonlinear relationships. The greedy nature of induction, however, may fail to elucidate underlying relationships between the data and endpoints. Using evolutionary programming, decision trees are induced which are significantly more accurate than trees induced by recursive partitioning. Furthermore, when assessed on previously unseen data in a 10-fold cross-validated manner, evolutionary programming induced trees exhibit a significantly higher accuracy on previously unseen data. This methodology is compared to single-tree and multiple-tree recursive partitioning in two domains (aerobic biodegradability and hepatotoxicity) and shown to produce less complex classifiers with average increases in predictive accuracy of 5−10% over the traditional method.
Measuring responses in the proteome to various perturbations improves our understanding of biological systems. The value of information gained from such studies is directly proportional to the number ...of proteins measured. To overcome technical challenges associated with highly multiplexed measurements, we developed an affinity reagent-based method that uses aptamers with protein-like side chains along with an assay that takes advantage of their unique properties. As hybrid affinity reagents, modified aptamers are fully comparable to antibodies in terms of binding characteristics toward proteins, including epitope size, shape complementarity, affinity and specificity. Our assay combines these intrinsic binding properties with serial kinetic proofreading steps to allow highly effective partitioning of stable specific complexes from unstable nonspecific complexes. The use of these orthogonal methods to enhance specificity effectively overcomes the severe limitation to multiplexing inherent to the use of sandwich-based methods. Our assay currently measures half of the unique proteins encoded in the human genome with femtomolar sensitivity, broad dynamic range and exceptionally high reproducibility. Using machine learning to identify patterns of change, we have developed tests based on measurement of multiple proteins predictive of current health states and future disease risk to guide a holistic approach to precision medicine.Measuring responses in the proteome to various perturbations improves our understanding of biological systems. The value of information gained from such studies is directly proportional to the number of proteins measured. To overcome technical challenges associated with highly multiplexed measurements, we developed an affinity reagent-based method that uses aptamers with protein-like side chains along with an assay that takes advantage of their unique properties. As hybrid affinity reagents, modified aptamers are fully comparable to antibodies in terms of binding characteristics toward proteins, including epitope size, shape complementarity, affinity and specificity. Our assay combines these intrinsic binding properties with serial kinetic proofreading steps to allow highly effective partitioning of stable specific complexes from unstable nonspecific complexes. The use of these orthogonal methods to enhance specificity effectively overcomes the severe limitation to multiplexing inherent to the use of sandwich-based methods. Our assay currently measures half of the unique proteins encoded in the human genome with femtomolar sensitivity, broad dynamic range and exceptionally high reproducibility. Using machine learning to identify patterns of change, we have developed tests based on measurement of multiple proteins predictive of current health states and future disease risk to guide a holistic approach to precision medicine.
Gemfibrozil-1-O-β-glucuronide (GEM-1-O-gluc), a major metabolite of the antihyperlipidemic drug gemfibrozil, is a mechanism-based inhibitor of P450 2C8 in vitro, and this irreversible inactivation ...may lead to clinical drug−drug interactions between gemfibrozil and other P450 2C8 substrates. In light of this in vitro finding and the observation that the glucuronide conjugate does not contain any obvious structural alerts, the current study was conducted to determine the potential site of GEM-1-O-gluc bioactivation and the subsequent mechanism of P450 2C8 inhibition (i.e., modification of apoprotein or heme). LC/MS analysis of a reaction mixture containing recombinant P450 2C8 and GEM-1-O-gluc revealed that the substrate was covalently linked to the heme prosthetic heme group during catalysis. A combination of mass spectrometry and deuterium isotope effects revealed that a benzylic carbon on the 2′,5′-dimethylphenoxy group of GEM-1-O-gluc was covalently bound to the heme of P450 2C8. The regiospecificity of substrate addition to the heme group was not confirmed experimentally, but computational modeling experiments indicated that the γ-meso position was the most likely site of modification. The metabolite profile, which consisted of two benzyl alcohol metabolites and a 4′-hydroxy-GEM-1-O-gluc metabolite, indicated that oxidation of GEM-1-O-gluc was limited to the 2′,5′-dimethylphenoxy group. These results are consistent with an inactivation mechanism wherein GEM-1-O-gluc is oxidized to a benzyl radical intermediate, which evades oxygen rebound, and adds to the γ-meso position of heme. Mechanism-based inhibition of P450 2C8 can be rationalized by the formation of the GEM-1-O-gluc-heme adduct and the consequential restriction of additional substrate access to the catalytic iron center.