Drug development has a high attrition rate, with poor pharmacokinetic and safety properties a significant hurdle. Computational approaches may help minimize these risks. We have developed a novel ...approach (pkCSM) which uses graph-based signatures to develop predictive models of central ADMET properties for drug development. pkCSM performs as well or better than current methods. A freely accessible web server (http://structure.bioc.cam.ac.uk/pkcsm), which retains no information submitted to it, provides an integrated platform to rapidly evaluate pharmacokinetic and toxicity properties.
Here, we report a webserver for the improved SDM, used for predicting the effects of mutations on protein stability. As a pioneering knowledge-based approach, SDM has been highlighted as the most ...appropriate method to use in combination with many other approaches. We have updated the environment-specific amino-acid substitution tables based on the current expanded PDB (a 5-fold increase in information), and introduced new residue-conformation and interaction parameters, including packing density and residue depth. The updated server has been extensively tested using a benchmark containing 2690 point mutations from 132 different protein structures. The revised method correlates well against the hypothetical reverse mutations, better than comparable methods built using machine-learning approaches, highlighting the strength of our knowledge-based approach for identifying stabilising mutations. Given a PDB file (a Protein Data Bank file format containing the 3D coordinates of the protein atoms), and a point mutation, the server calculates the stability difference score between the wildtype and mutant protein. The server is available at http://structure.bioc.cam.ac.uk/sdm2.
Mutations play fundamental roles in evolution by introducing diversity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the ...organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein-protein and protein-nucleic acid interactions.
We show that mCSM performs as well as or better than other methods that are used widely. The mCSM signatures were successfully used in different tasks demonstrating that the impact of a mutation can be correlated with the atomic-distance patterns surrounding an amino acid residue. We showed that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario.
A web server is available at http://structure.bioc.cam.ac.uk/mcsm.
DNA-dependent protein kinase catalytic subunit (DNA-PKcs) is a central component of nonhomologous end joining (NHEJ), repairing DNA double-strand breaks that would otherwise lead to apoptosis or ...cancer. We have solved its structure in complex with the C-terminal peptide of Ku80 at 4.3 angstrom resolution using x-ray crystallography. We show that the 4128–amino acid structure comprises three large structural units: the N-terminal unit, the Circular Cradle, and the Head. Conformational differences between the two molecules in the asymmetric unit are correlated with changes in accessibility of the kinase active site, which are consistent with an allosteric mechanism to bring about kinase activation. The location of KU80ct194 in the vicinity of the breast cancer 1 (BRCA1) binding site suggests competition with BRCA1, leading to pathway selection between NHEJ and homologous recombination.
Cancer genome and other sequencing initiatives are generating extensive data on non-synonymous single nucleotide polymorphisms (nsSNPs) in human and other genomes. In order to understand the impacts ...of nsSNPs on the structure and function of the proteome, as well as to guide protein engineering, accurate in silicomethodologies are required to study and predict their effects on protein stability. Despite the diversity of available computational methods in the literature, none has proven accurate and dependable on its own under all scenarios where mutation analysis is required. Here we present DUET, a web server for an integrated computational approach to study missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM). We demonstrate that the proposed method improves overall accuracy of the predictions in comparison with either method individually and performs as well as or better than similar methods. The DUET web server is freely and openly available at http://structure.bioc.cam.ac.uk/duet.
The sheer volume of non-synonymous single nucleotide polymorphisms that have been generated in recent years from projects such as the Human Genome Project, the HapMap Project and Genome-Wide ...Association Studies means that it is not possible to characterize all mutations experimentally on the gene products, i.e. elucidate the effects of mutations on protein structure and function. However, automatic methods that can predict the effects of mutations will allow a reduced set of mutations to be studied. Site Directed Mutator (SDM) is a statistical potential energy function that uses environment-specific amino-acid substitution frequencies within homologous protein families to calculate a stability score, which is analogous to the free energy difference between the wild-type and mutant protein. Here, we present a web server for SDM (http://www-cryst.bioc.cam.ac.uk/~sdm/sdm.php), which has obtained more than 10 000 submissions since being online in April 2008. To run SDM, users must upload a wild-type structure and the position and amino acid type of the mutation. The results returned include information about the local structural environment of the wild-type and mutant residues, a stability score prediction and prediction of disease association. Additionally, the wild-type and mutant structures are displayed in a Jmol applet with the relevant residues highlighted.
DNA-dependent protein kinase (DNA-PK), a multicomponent complex including the DNA-PK catalytic subunit and Ku70/80 heterodimer together with DNA, is central to human DNA damage response and repair. ...Using a DNA-PK-selective inhibitor (M3814), we identified from one dataset two cryo-EM structures of the human DNA-PK complex in different states, the intermediate state and the active state. Here we show that activation of the kinase is regulated through conformational changes caused by the binding ligand and the string region (residues 802-846) of the DNA-PK catalytic subunit, particularly the helix-hairpin-helix motif (residues 816-836) that interacts with DNA. These observations demonstrate the regulatory role of the ligand and explain why DNA-PK is DNA dependent. Cooperation and coordination among binding partners, disordered flexible regions and mechanically flexible HEAT repeats modulate the activation of the kinase. Together with previous findings, these results provide a better molecular understanding of DNA-PK catalysis.
The development of structure-guided drug discovery is a story of knowledge exchange where new ideas originate from all parts of the research ecosystem. Dorothy Crowfoot Hodgkin obtained insulin from ...Boots Pure Drug Company in the 1930s and insulin crystallization was optimized in the company Novo in the 1950s, allowing the structure to be determined at Oxford University. The structure of renin was developed in academia, on this occasion in London, in response to a need to develop antihypertensives in pharma. The idea of a dimeric aspartic protease came from an international academic team and was discovered in HIV; it eventually led to new HIV antivirals being developed in industry. Structure-guided fragment-based discovery was developed in large pharma and biotechs, but has been exploited in academia for the development of new inhibitors targeting protein-protein interactions and also antimicrobials to combat mycobacterial infections such as tuberculosis. These observations provide a strong argument against the so-called 'linear model', where ideas flow only in one direction from academic institutions to industry. Structure-guided drug discovery is a story of applications of protein crystallography and knowledge exhange between academia and industry that has led to new drug approvals for cancer and other common medical conditions by the Food and Drug Administration in the USA, as well as hope for the treatment of rare genetic diseases and infectious diseases that are a particular challenge in the developing world.
Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational ...prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein–ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein–ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.