Quantum computers can in principle solve certain problems exponentially more quickly than their classical counterparts. We have not yet reached the advent of useful quantum computation, but when we ...do, it will affect nearly all scientific disciplines. In this review, we examine how current quantum algorithms could revolutionize computational biology and bioinformatics. There are potential benefits across the entire field, from the ability to process vast amounts of information and run machine learning algorithms far more efficiently, to algorithms for quantum simulation that are poised to improve computational calculations in drug discovery, to quantum algorithms for optimization that may advance fields from protein structure prediction to network analysis. However, these exciting prospects are susceptible to “hype,” and it is also important to recognize the caveats and challenges in this new technology. Our aim is to introduce the promise and limitations of emerging quantum computing technologies in the areas of computational molecular biology and bioinformatics.
This article is categorized under:
Structure and Mechanism > Computational Biochemistry and Biophysics
Data Science > Computer Algorithms and Programming
Electronic Structure Theory > Ab Initio Electronic Structure Methods
Quantum computers promise faster algorithms that can affect molecular biology and bioinformatics – for example, in data analysis, electronic structure simulations and protein modeling.
Molecular docking Morris, Garrett M; Lim-Wilby, Marguerita
Methods in molecular biology (Clifton, N.J.),
2008, Volume:
443
Journal Article
Molecular docking is a key tool in structural molecular biology and computer-assisted drug design. The goal of ligand-protein docking is to predict the predominant binding mode(s) of a ligand with a ...protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that correctly ranks candidate dockings. Docking can be used to perform virtual screening on large libraries of compounds, rank the results, and propose structural hypotheses of how the ligands inhibit the target, which is invaluable in lead optimization. The setting up of the input structures for the docking is just as important as the docking itself, and analyzing the results of stochastic search methods can sometimes be unclear. This chapter discusses the background and theory of molecular docking software, and covers the usage of some of the most-cited docking software.
The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of ...state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often produce physically implausible molecular structures. It is therefore not sufficient to evaluate these methods solely by RMSD to a native binding mode. It is vital, particularly for deep learning-based methods, that they are also evaluated on steric and energetic criteria. We present PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit. The PoseBusters test suite validates chemical and geometric consistency of a ligand including its stereochemistry, and the physical plausibility of intra- and intermolecular measurements such as the planarity of aromatic rings, standard bond lengths, and protein-ligand clashes. Only methods that both pass these checks and predict native-like binding modes should be classed as having "state-of-the-art" performance. We use PoseBusters to compare five deep learning-based docking methods (DeepDock, DiffDock, EquiBind, TankBind, and Uni-Mol) and two well-established standard docking methods (AutoDock Vina and CCDC Gold) with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. We show that both in terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools. In addition, we find that molecular mechanics force fields contain docking-relevant physics missing from deep-learning methods. PoseBusters allows practitioners to assess docking and molecular generation methods and may inspire new inductive biases still required to improve deep learning-based methods, which will help drive the development of more accurate and more realistic predictions.
PoseBusters assesses molecular poses using steric and energetic criteria. We find that classical protein-ligand docking tools currently still outperform deep learning-based methods.
Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water ...molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock, we show that the freely available AutoDock Vina tool can be used to predict accurately the binding sites of water molecules. WaterDock was validated using data from X-ray crystallography, neutron diffraction and molecular dynamics simulations and correctly predicted 97% of the water molecules in the test set. In addition, we combined data-mining, heuristic and machine learning techniques to develop probabilistic water molecule classifiers. When applied to WaterDock predictions in the Astex Diverse Set of protein ligand complexes, we could identify whether a water molecule was conserved or displaced to an accuracy of 75%. A second model predicted whether water molecules were displaced by polar groups or by non-polar groups to an accuracy of 80%. These results should prove useful for anyone wishing to undertake rational design of new compounds where the displacement of water molecules is being considered as a route to improved affinity.
The calculation of the entropy of flexible molecules can be challenging, since the number of possible conformers can grow exponentially with molecule size and many low-energy conformers may be ...thermally accessible. Different methods have been proposed to approximate the contribution of conformational entropy to the molecular standard entropy, including performing thermochemistry calculations with all possible stable conformations and developing empirical corrections from experimental data. We have performed conformer sampling on over 120,000 small molecules generating some 12 million conformers, to develop models to predict conformational entropy across a wide range of molecules. Using insight into the nature of conformational disorder, our cross-validated physically motivated statistical model gives a mean absolute error of ∼4.8 J/mol·K or under 0.4 kcal/mol at 300 K. Beyond predicting molecular entropies and free energies, the model implies a high degree of correlation between torsions in most molecules, often assumed to be independent. While individual dihedral rotations may have low energetic barriers, the shape and chemical functionality of most molecules necessarily correlate their torsional degrees of freedom and hence restrict the number of low-energy conformations immensely. Our simple models capture these correlations and advance our understanding of small molecule conformational entropy.
Abstract
The SARS-CoV-2 coronavirus is the causal agent of the current global pandemic. SARS-CoV-2 belongs to an order, Nidovirales, with very large RNA genomes. It is proposed that the fidelity of ...coronavirus (CoV) genome replication is aided by an RNA nuclease complex, comprising the non-structural proteins 14 and 10 (nsp14–nsp10), an attractive target for antiviral inhibition. Our results validate reports that the SARS-CoV-2 nsp14–nsp10 complex has RNase activity. Detailed functional characterization reveals nsp14–nsp10 is a versatile nuclease capable of digesting a wide variety of RNA structures, including those with a blocked 3′-terminus. Consistent with a role in maintaining viral genome integrity during replication, we find that nsp14–nsp10 activity is enhanced by the viral RNA-dependent RNA polymerase complex (RdRp) consisting of nsp12–nsp7–nsp8 (nsp12–7–8) and demonstrate that this stimulation is mediated by nsp8. We propose that the role of nsp14–nsp10 in maintaining replication fidelity goes beyond classical proofreading by purging the nascent replicating RNA strand of a range of potentially replication-terminating aberrations. Using our developed assays, we identify drug and drug-like molecules that inhibit nsp14–nsp10, including the known SARS-CoV-2 major protease (Mpro) inhibitor ebselen and the HIV integrase inhibitor raltegravir, revealing the potential for multifunctional inhibitors in COVID-19 treatment.
The geometry of a molecule plays a significant role in determining its physical and chemical properties. Despite its importance, there are relatively few studies on ring puckering and conformations, ...often focused on small cycloalkanes, 5- and 6-membered carbohydrate rings, and specific macrocycle families. We lack a general understanding of the puckering preferences of medium-sized rings and macrocycles. To address this, we provide an extensive conformational analysis of a diverse set of rings. We used Cremer–Pople puckering coordinates to study the trends of the ring conformation across a set of 140 000 diverse small molecules, including small rings, macrocycles, and cyclic peptides. By standardizing using key atoms, we show that the ring conformations can be classified into relatively few conformational clusters, based on their canonical forms. The number of such canonical clusters increases slowly with ring size. Ring puckering motions, especially pseudo-rotations, are generally restricted and differ between clusters. More importantly, we propose models to map puckering preferences to torsion space, which allows us to understand the inter-related changes in torsion angles during pseudo-rotation and other puckering motions. Beyond ring puckers, our models also explain the change in substituent orientation upon puckering. We also present a novel knowledge-based sampling method using the puckering preferences and coupled substituent motion to generate ring conformations efficiently. In summary, this work provides an improved understanding of general ring puckering preferences, which will in turn accelerate the identification of low-energy ring conformations for applications from polymeric materials to drug binding.
Machine learning scoring functions for protein–ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound ...protein–ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation setthe Updated DUD-E Diverse Subsetfor binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.