Machine learning in virtual screening Melville, James L; Burke, Edmund K; Hirst, Jonathan D
Combinatorial chemistry & high throughput screening
12, Številka:
4
Journal Article
Recenzirano
In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize ...databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.
Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural ...descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs.
Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases.
The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.
Circular and linear dichroism of proteins Bulheller, Benjamin M; Rodger, Alison; Hirst, Jonathan D
Physical chemistry chemical physics : PCCP,
01/2007, Letnik:
9, Številka:
17
Journal Article
Recenzirano
Circular dichroism (CD) is an important technique in the structural characterisation of proteins, and especially for secondary structure determination. The CD of proteins can be calculated from first ...principles using the so-called matrix method, with an accuracy which is almost quantitative for helical proteins. Thus, for proteins of unknown structure, CD calculations and experimental data can be used in conjunction to aid structure analysis. Linear dichroism (LD) can be calculated using analogous methodology and has been used to establish the relative orientations of subunits in proteins and protein orientation in an environment such as a membrane. However, simple analysis of LD data is not possible, due to overlapping transitions. So coupling the calculations and experiment is an important strategy. In this paper, the use of LD for the determination of protein orientation and how these data can be interpreted with the aid of calculations, are discussed. We review methods for the calculation of CD spectra, focusing on semiempirical and ab initio parameter sets used in the matrix method. Lastly, a new web interface for online CD and LD calculation is presented.
A fully quantitative theory of the relationship between protein conformation and optical spectroscopy would facilitate deeper insights into biophysical and simulation studies of protein dynamics and ...folding. In contrast to intense bands in the far-ultraviolet, near-UV bands are much weaker and have been challenging to compute theoretically. We report some advances in the accuracy of calculations in the near-UV, which were realised through the consideration of the vibrational structure of the electronic transitions of aromatic side chains.
Modeling the amide I bands of small peptides la Cour Jansen, Thomas; Dijkstra, Arend G; Watson, Tim M ...
The Journal of chemical physics,
07/2006, Letnik:
125, Številka:
4
Journal Article
Recenzirano
Odprti dostop
In this paper different floating oscillator models for describing the amide I band of peptides and proteins are compared with density functional theory (DFT) calculations. Models for the variation of ...the frequency shifts of the oscillators and the nearest-neighbor coupling between them with respect to conformation are constructed from DFT normal mode calculations on N-acetyl-glycine-N(')-methylamide. The calculated frequencies are compared with those obtained from existing electrostatic models. Furthermore, a new transition charge coupling model is presented. We suggest a model which combines the nearest-neighbor maps with long-range interactions accounted for using the new transition charge model and an existing electrostatic map for long-range interaction frequency shifts. This model and others, which account for the frequency shifts by electrostatic maps exclusively, are tested by comparing the predicted IR spectra with those from DFT calculations on the pentapeptide Leu-enkephalin. The new model described above gives the best agreement and, after a systematic blueshift is accounted for, reproduces the DFT frequencies to within 3.5 cm(-1). The correlation of the intensities for this model with intensities from DFT calculations is 0.94.
Alchemical free energy perturbation (FEP) theory is widely used nowadays to calculate protein-ligand binding energies, often in support of drug discovery endeavours. We assess the accuracy and ...sensitivity of absolute FEP binding energies with respect to the CHARMM/CGenFF and the AMBER/GAFF force field parameterisations for a set of tetrahydroquinoline inhibitors of the first bromodomain of BRD4, a target of keen interest for the development of anti-cancer drugs. We find that AMBER/GAFF is better able than CHARMM/CGenFF to cover the range of and to distinguish between the relative binding energies of the 16 ligands.
Dewar Benzenoids Discovered In Carbon Nanobelts Hanson-Heine, Magnus W. D; Rogers, David M; Woodward, Simon ...
The journal of physical chemistry letters,
05/2020, Letnik:
11, Številka:
10
Journal Article
Recenzirano
Odprti dostop
The synthesis of cyclacene nanobelts remains an elusive goal dating back over 60 years. These molecules represent the last unsynthesized building block of carbon nanotubes and may be useful both as ...seed molecules for the preparation of structurally well-defined carbon nanotubes and for understanding the behavior and formation of zigzag nanotubes more broadly. Here we report the discovery that isomers containing two Dewar benzenoid rings are the preferred form for several sizes of cyclacene. The predicted lower polyradical character and higher singlet–triplet stability that these isomers possess compared with their pure benzenoid counterparts suggest that they may be more stable synthetic targets than the structures that have previously been identified. Our findings should facilitate the exploration of new routes to cyclacene synthesis through Dewar benzene chemistry.
Nonlinear two-dimensional infrared spectroscopy (2DIR) is most commonly simulated within the framework of the exciton method. The key parameters for these calculations include the frequency of the ...oscillators within their molecular environments and coupling constants that describe the strength of coupling between the oscillators. It is shown that these quantities can be obtained directly from harmonic frequency calculations by exploiting a procedure that localizes the normal modes. This approach is demonstrated using the amide I modes of polypeptides. For linear and cyclic diamides and hexapeptide Z-Aib-l-Leu-(Aib)2-Gly-Aib-OtBu, the computed parameters are compared with those from existing schemes, and the resulting 2DIR spectra are consistent with experimental observations. The incorporation of conformational averaging of structures from molecular dynamics simulations is discussed, and a hybrid scheme wherein the Hamiltonian matrix from the quantum chemical local-mode approach is combined with fluctuations from empirical schemes is shown to be consistent with experiment. The work demonstrates that localized vibrational modes can provide a foundation for the calculation of 2DIR spectra that does not rely on extensive parametrization and can be applied to a wide range of systems. For systems that are too large for quantum chemical harmonic frequency calculations, the local-mode approach provides a convenient platform for the development of site frequency and coupling maps.
Enzyme-based iron–sulfur clusters, exemplified in families such as hydrogenases, nitrogenases, and radical S-adenosylmethionine enzymes, feature in many essential biological processes. The ...functionality of biological iron–sulfur clusters extends beyond simple electron transfer, relying primarily on the redox activity of the clusters, with a remarkable diversity for different enzymes. The active-site structure and the electrostatic environment in which the cluster resides direct this redox reactivity. Oriented electric fields in enzymatic active sites can be significantly strong, and understanding the extent of their effect on iron–sulfur cluster reactivity can inform first steps toward rationally engineering their reactivity. An extensive systematic density functional theory-based screening approach using OPBE/TZP has afforded a simple electric field-effect representation. The results demonstrate that the orientation of an external electric field of strength 28.8 MV cm–1 at the center of the cluster can have a significant effect on its relative stability in the order of 35 kJ mol–1. This shows clear implications for the reactivity of iron–sulfur clusters in enzymes. The results also demonstrate that the orientation of the electric field can alter the most stable broken-symmetry state, which further has implications on the directionality of initiated electron-transfer reactions. These insights open the path for manipulating the enzymatic redox reactivity of iron–sulfur cluster-containing enzymes by rationally engineering oriented electric fields within the enzymes.
Reversible addition–fragmentation chain transfer (RAFT) dispersion polymerisation of methyl methacrylate (MMA) is performed in supercritical carbon dioxide (scCO2) with ...2-(dodecylthiocarbonothioylthio)-2-methylpropionic acid (DDMAT) present as chain transfer agent (CTA) and surprisingly shows good control over PMMA molecular weight. Kinetic studies of the polymerisation in scCO2 also confirm these data. By contrast, only poor control of MMA polymerisation is obtained in toluene solution, as would be expected for this CTA which is better suited for acrylates. In this regard, we select a range of CTAs and use them to determine the parameters that must be considered for good control in dispersion polymerisation in scCO2. A thorough investigation of the nucleation stage during the dispersion polymerisation reveals an unexpected “in situ two-stage” mechanism that strongly determines how the CTA works. Finally, using a novel computational solvation model, we identify a correlation between polymerisation control and degree of solubility of the CTAs. All of this ultimately gives rise to a simple, elegant and counterintuitive guideline to select the best CTA for RAFT dispersion polymerisation in scCO2.