Deep learning has transformed protein structure modeling. Here we relate AlphaFold and RoseTTAFold to classical physically based approaches to protein structure prediction, and discuss the many areas ...of structural biology that are likely to be affected by further advances in deep learning.
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein ...structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. ...Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion.
Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to ...score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue‐wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.
Accurate and rapid calculation of protein-small molecule interaction free energies is critical for computational drug discovery. Because of the large chemical space spanned by drug-like molecules, ...classical force fields contain thousands of parameters describing atom-pair distance and torsional preferences; each parameter is typically optimized independently on simple representative molecules. Here, we describe a new approach in which small molecule force field parameters are jointly optimized guided by the rich source of information contained within thousands of available small molecule crystal structures. We optimize parameters by requiring that the experimentally determined molecular lattice arrangements have lower energy than all alternative lattice arrangements. Thousands of independent crystal lattice-prediction simulations were run on each of 1386 small molecule crystal structures, and energy function parameters of an implicit solvent energy model were optimized, so native crystal lattice arrangements had the lowest energy. The resulting energy model was implemented in Rosetta, together with a rapid genetic algorithm docking method employing grid-based scoring and receptor flexibility. The success rate of bound structure recapitulation in cross-docking on 1112 complexes was improved by more than 10% over previously published methods, with solutions within <1 Å in over half of the cases. Our results demonstrate that small molecule crystal structures are a rich source of information for guiding molecular force field development, and the improved Rosetta energy function should increase accuracy in a wide range of small molecule structure prediction and design studies.
Abstract
Protein–protein interactions play crucial roles in diverse biological processes, including various disease progressions. Atomistic structural details of protein–protein interactions may ...provide important information that can facilitate the design of therapeutic agents. GalaxyHeteromer is a freely available automatic web server (http://galaxy.seoklab.org/heteromer) that predicts protein heterodimer complex structures from two subunit protein sequences or structures. When subunit structures are unavailable, they are predicted by template- or distance-prediction-based modelling methods. Heterodimer complex structures can be predicted by both template-based and ab initio docking, depending on the template's availability. Structural templates are detected from the protein structure database based on both the sequence and structure similarities. The templates for heterodimers may be selected from monomer and homo-oligomer structures, as well as from hetero-oligomers, owing to the evolutionary relationships of heterodimers with domains of monomers or subunits of homo-oligomers. In addition, the server employs one of the best ab initio docking methods when heterodimer templates are unavailable. The multiple heterodimer structure models and the associated scores, which are provided by the web server, may be further examined by user to test or develop functional hypotheses or to design new functional molecules.
Graphical Abstract
Graphical Abstract
GalaxyHeteromer is a freely available automatic web server (http://galaxy.seoklab.org/heteromer) that predicts protein heterodimer complex structures from two subunit protein sequences or structures. When subunit structures are unavailable, they are predicted by template- or distance-prediction-based modelling methods. Heterodimer complex structures can be predicted by both template-based and ab initio docking, depending on the template’s availability.
We present the quality assessment of 5613 models submitted by predictor groups from both CAPRI and CASP for the total of 15 most tractable targets from the second joint CASP‐CAPRI protein assembly ...prediction experiment. These targets comprised 12 homo‐oligomers and 3 hetero‐complexes. The bulk of the analysis focuses on 10 targets (of CAPRI Round 37), which included all 3 hetero‐complexes, and whose protein chains or the full assembly could be readily modeled from structural templates in the PDB. On average, 28 CAPRI groups and 10 CASP groups (including automatic servers), submitted models for each of these 10 targets. Additionally, about 16 groups participated in the CAPRI scoring experiments. A range of acceptable to high quality models were obtained for 6 of the 10 Round 37 targets, for which templates were available for the full assembly. Poorer results were achieved for the remaining targets due to the lower quality of the templates available for the full complex or the individual protein chains, highlighting the unmet challenge of modeling the structural adjustments of the protein components that occur upon binding or which must be accounted for in template‐based modeling. On the other hand, our analysis indicated that residues in binding interfaces were correctly predicted in a sizable fraction of otherwise poorly modeled assemblies and this with higher accuracy than published methods that do not use information on the binding partner. Lastly, the strengths and weaknesses of the assessment methods are evaluated and improvements suggested.
Recently it has become possible to de novo design high affinity protein binding proteins from target structural information alone. There is, however, considerable room for improvement as the overall ...design success rate is low. Here, we explore the augmentation of energy-based protein binder design using deep learning. We find that using AlphaFold2 or RoseTTAFold to assess the probability that a designed sequence adopts the designed monomer structure, and the probability that this structure binds the target as designed, increases design success rates nearly 10-fold. We find further that sequence design using ProteinMPNN rather than Rosetta considerably increases computational efficiency.
For CASP14, we developed deep learning‐based methods for predicting homo‐oligomeric and hetero‐oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an ...oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient‐based fold‐and‐dock method with template‐based and ab initio docking approaches using deep learning‐based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z‐scores 5.5 units higher than the next best group, with the fold‐and‐dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM‐score of 0.71 (average oligomer TM‐score of the next best group: 0.64), and explicit modeling of inter‐subunit interactions improved modeling of six out of 40 individual domains (ΔGDT‐TS > 2.0).
To protect themselves from host attack, numerous jumbo bacteriophages establish a phage nucleus-a micron-scale, proteinaceous structure encompassing the replicating phage DNA. Bacteriophage and host ...proteins associated with replication and transcription are concentrated inside the phage nucleus while other phage and host proteins are excluded, including CRISPR-Cas and restriction endonuclease host defense systems. Here, we show that nucleus fragments isolated from ϕPA3 infected Pseudomonas aeruginosa form a 2-dimensional lattice, having p2 or p4 symmetry. We further demonstrate that recombinantly purified primary Phage Nuclear Enclosure (PhuN) protein spontaneously assembles into similar 2D sheets with p2 and p4 symmetry. We resolve the dominant p2 symmetric state to 3.9 Å by cryo-EM. Our structure reveals a two-domain core, organized into quasi-symmetric tetramers. Flexible loops and termini mediate adaptable inter-tetramer contacts that drive subunit assembly into a lattice and enable the adoption of different symmetric states. While the interfaces between subunits are mostly well packed, two are open, forming channels that likely have functional implications for the transport of proteins, mRNA, and small molecules.