Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We ...collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma ...protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.
Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time ...frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17.
As a proof-of-concept, we identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a gold-standard data set of 6140 known clinical outcomes. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, LASSO, Random Forest, Tensor Factorization and Gradient Boosting Machine. With 10-fold cross-validation, tensor factorization achieved AUROC = 0.82 ± 0.02 and AUPRC = 0.71 ± 0.03. Across multiple validation schemes, this was comparable or better than other methods.
In this work, we benchmarked a machine learning technique called tensor factorization for the problem of predicting clinical outcomes of therapeutic hypotheses. Results have shown that this method can achieve equal or better prediction performance compared with a variety of baseline models. We demonstrate one application of the method to predict outcomes of trials on novel indications of approved drug targets. This work can be expanded to targets and indications that have never been clinically tested and proposing novel target-indication hypotheses. Our proposed biologically-motivated cross-validation schemes provide insight into the robustness of the prediction performance. This has significant implications for all future methods that try to address this seminal problem in drug discovery.
•Pathway analysis of GWAS loci identifies novel drug targets and repurposing opportunities.•Global systematic analysis of 1,589 GWAS across 1,456 protein interaction pathways.•New drug discovery and ...repositioning opportunities for 182 diseases.•30% of diseases have significantly more targets in the pathway space.•Framework for translating GWAS results into actionable targets.
Genome-wide association studies (GWAS) have made considerable progress and there is emerging evidence that genetics-based targets can lead to 28% more launched drugs. We analyzed 1589 GWAS across 1456 pathways to translate these often imprecise genetic loci into therapeutic hypotheses for 182 diseases. These pathway-based genetic targets were validated by testing whether current drug targets were enriched in the pathway space for the same indication. Remarkably, 30% of diseases had significantly more targets in these pathways than expected by chance; the comparable number for GWAS alone (without pathway analysis) was zero. This study shows that a systematic global pathway analysis can translate genetic findings into therapeutic hypotheses for both new drug discovery and repositioning opportunities for current drugs.
Genetic evidence of disease association has often been used as a basis for selecting of drug targets for complex common diseases. Likewise, the propagation of genetic evidence through gene or protein ...interaction networks has been shown to accurately infer novel disease associations at genes for which no direct genetic evidence can be observed. However, an empirical test of the utility of combining these approaches for drug discovery has been lacking. In this study, we examine genetic associations arising from an analysis of 648 UK Biobank GWAS and evaluate whether targets identified as proxies of direct genetic hits are enriched for successful drug targets, as measured by historical clinical trial data. We find that protein networks formed from specific functional linkages such as protein complexes and ligand-receptor pairs are suitable for even naïve guilt-by-association network propagation approaches. In addition, more sophisticated approaches applied to global protein-protein interaction networks and pathway databases, also successfully retrieve targets enriched for clinically successful drug targets. We conclude that network propagation of genetic evidence can be used for drug target identification.
In response to DNA damage and replication blocks, cells activate pathways that arrest the cell cycle and induce the transcription of genes that facilitate repair. In mammals, ATM (ataxia ...telangiectasia mutated) kinase together with other checkpoint kinases are important components in this response. We have cloned the rat and human homologs of Saccharomyces cerevisiae Rad 53 and Schizosaccharomyces pombe Cds1, called checkpoint kinase 2 (chk2). Complementation studies suggest that Chk2 can partially replace the function of the defective checkpoint kinase in the Cds1 deficient yeast strain. Chk2 was phosphorylated and activated in response to DNA damage in an ATM dependent manner. Its activation in response to replication blocks by hydroxyurea (HU) treatment, however, was independent of ATM. Using mass spectrometry, we found that, similar to Chk1, Chk2 can phosphorylate serine 216 in Cdc25C, a site known to be involved in negative regulation of Cdc25C. These results suggest that Chk2 is a downstream effector of the ATM-dependent DNA damage checkpoint pathway. Activation of Chk2 might not only delay mitotic entry, but also increase the capacity of cultured cells to survive after treatment with gamma-radiation or with the topoisomerase-I inhibitor topotecan.
It is commonly assumed that drug targets are expressed in tissues relevant to their indicated diseases, even under normal conditions. While multiple anecdotal cases support this hypothesis, a ...comprehensive study has not been performed to verify it. We conducted a systematic analysis to assess gene and protein expression for all targets of marketed and phase III drugs across a diverse collection of normal human tissues. For 87% of gene-disease pairs, the target is expressed in a disease-affected tissue under healthy conditions. This result validates the importance of confirming expression of a novel drug target in an appropriate tissue for each disease indication and strengthens previous findings showing that targets of efficacious drugs should be expressed in relevant tissues under normal conditions. Further characterization of the remaining 13% of gene-disease pairs revealed that most genes are expressed in a different tissue linked to another disease. Our analysis demonstrates the value of extensive tissue specific expression resources.both in terms of tissue and cell diversity as well as techniques used to measure gene expression.
To evaluate whether a p38α/β mitogen-activated protein kinase inhibitor, SB-681323, would limit the elevation of an inflammatory marker, high-sensitivity C-reactive protein (hsCRP), after a ...percutaneous coronary intervention (PCI).
Coronary artery stents provide benefit by maintaining lumen patency but may incur vascular trauma and inflammation, leading to myocardial damage. A key mediator for such stress signaling is p38 mitogen-activated protein kinase. Patients with angiographically documented coronary artery disease receiving stable statin therapy and about to undergo PCI were randomly selected to receive SB-681323, 7.5 mg (n=46), or placebo (n=46) daily for 28 days, starting 3 days before PCI. On day 3, before PCI, hsCRP was decreased in the SB-681323 group relative to the placebo group (29% lower; P=0.02). After PCI, there was a statistically significant attenuation in the increase in hsCRP in the SB-681323 group relative to the placebo group (37% lower on day 5 P=0.04; and 40% lower on day 28 P=0.003). There were no adverse safety signals after 28 days of treatment with SB-681323.
In the setting of statin therapy, SB-681323 significantly attenuated the post-PCI inflammatory response, as measured by hsCRP. This inflammatory dampening implicates p38 mitogen-activated protein kinase in the poststent response, potentially defining an avenue to limit poststent restenosis.
Light-chain (L-chain) amyloidosis is characterized by deposition of fibrillar aggregates composed of the N-terminal L-chain variable region (VL) domain of an immunoglobulin, generally in individuals ...overproducing a monoclonal L chain. In addition to proteolytic fragmentation and high protein concentration, particular amino acid substitutions may also contribute to the tendency of an L chain to aggregate in L-chain amyloidosis, although evidence in support of this has been limited and difficult to interpret. In this paper we identify particular amino acid replacements at specific positions in the VLdomain that are occupied at frequencies significantly higher in those L chains associated with amyloidosis. Analysis of the structural model for the VLdomain of the Bence-Jones protein REI suggests that these positions play important roles in maintaining domain structure and stability. Using an Escherichia coli expression system, we prepared single-point mutants of REI VLincorporating amyloid-associated amino acid replacements that are both rare and located at structurally important positions. These mutants support ordered aggregate formation in an in vitro L-chain fibril formation model in which wild-type REI VLremains soluble. Moreover, the ability of these sequences to aggregate in vitro correlates well with the extent to which domain stability is decreased in denaturant-induced unfolding. The results are consistent with a mechanism for the disease process in which the VLdomain, either before or after proteolytic cleavage from the L-chain constant region domain, unfolds by virtue of one or more destabilizing amino acid replacements to generate an aggregation-prone nonnative state.