There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences
. Here we investigate whether the ...information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.
KRAS switch loop movements play a crucial role in regulating RAS signaling, and alteration of these sensitive dynamics is a principal mechanism through which disease-associated RAS mutations lead to ...aberrant RAS activation. Prior studies suggest that despite a high degree of sequence similarity, the switches in KRAS are more dynamic than those in HRAS. We determined X-ray crystal structures of the rare tumorigenic KRAS mutants KRASD33E, in switch 1 (SW1), and KRASA59G, in switch 2 (SW2), bound to GDP and found these adopt nearly identical, open SW1 conformations as well as altered SW2 conformations. KRASA59G bound to a GTP analogue crystallizes in the same conformation. This open conformation is consistent with the inactive “state 1” previously observed for HRAS bound to GTP. For KRASA59G, switch rearrangements may be regulated by increased flexibility in the 57DXXGQ61 motif at codon 59. However, loss of interactions between side chains at codons 33 and 35 in the SW1 33DPT35 motif drives changes for KRASD33E. The 33DPT35 motif is conserved for multiple members of the RAS subfamily but is not found in RAB, RHO, ARF, or Gα families, suggesting that dynamics mediated by this motif may be important for determining the selectivity of RAS–effector interactions. Biochemically, the consequence of altered switch dynamics is the same, showing impaired interaction with the guanine exchange factor SOS and loss of GAP-dependent GTPase activity. However, interactions with the RBD of RAF are preserved. Overall, these observations add to a body of evidence suggesting that HRAS and KRAS show meaningful differences in functionality stemming from differential protein dynamics independent of the hypervariable region.
Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a ...residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.
To create new enzymes and biosensors from scratch, precise control over the structure of small-molecule binding sites is of paramount importance, but systematically designing arbitrary protein pocket ...shapes and sizes remains an outstanding challenge. Using the NTF2-like structural superfamily as a model system, we developed an enumerative algorithm for creating a virtually unlimited number of de novo proteins supporting diverse pocket structures. The enumerative algorithm was tested and refined through feedback from two rounds of large-scale experimental testing, involving in total the assembly of synthetic genes encoding 7,896 designs and assessment of their stability on yeast cell surface, detailed biophysical characterization of 64 designs, and crystal structures of 5 designs. The refined algorithm generates proteins that remain folded at high temperatures and exhibit more pocket diversity than naturally occurring NTF2-like proteins. We expect this approach to transform the design of smallmolecule sensors and enzymes by enabling the creation of binding and active site geometries much more optimal for specific design challenges than is accessible by repurposing the limited number of naturally occurring NTF2-like proteins.
Asymmetric multiprotein complexes that undergo subunit exchange play central roles in biology but present a challenge for design because the components must not only contain interfaces that enable ...reversible association but also be stable and well behaved in isolation. We use implicit negative design to generate β sheet-mediated heterodimers that can be assembled into a wide variety of complexes. The designs are stable, folded, and soluble in isolation and rapidly assemble upon mixing, and crystal structures are close to the computational models. We construct linearly arranged hetero-oligomers with up to six different components, branched hetero-oligomers, closed C4-symmetric two-component rings, and hetero-oligomers assembled on a cyclic homo-oligomeric central hub and demonstrate that such complexes can readily reconfigure through subunit exchange. Our approach provides a general route to designing asymmetric reconfigurable protein systems.
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid ...sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Despite recent success in computational design of structured cyclic peptides, de novo design of cyclic peptides that bind to any protein functional site remains difficult. To address this challenge, ...we develop a computational "anchor extension" methodology for targeting protein interfaces by extending a peptide chain around a non-canonical amino acid residue anchor. To test our approach using a well characterized model system, we design cyclic peptides that inhibit histone deacetylases 2 and 6 (HDAC2 and HDAC6) with enhanced potency compared to the original anchor (IC
values of 9.1 and 4.4 nM for the best binders compared to 5.4 and 0.6 µM for the anchor, respectively). The HDAC6 inhibitor is among the most potent reported so far. These results highlight the potential for de novo design of high-affinity protein-peptide interfaces, as well as the challenges that remain.
Biomolecules modulate inorganic crystallization to generate hierarchically structured biominerals, but the atomic structure of the organic-inorganic interfaces that regulate mineralization remain ...largely unknown. We hypothesized that heterogeneous nucleation of calcium carbonate could be achieved by a structured flat molecular template that pre-organizes calcium ions on its surface. To test this hypothesis, we design helical repeat proteins (DHRs) displaying regularly spaced carboxylate arrays on their surfaces and find that both protein monomers and protein-Ca
supramolecular assemblies directly nucleate nano-calcite with non-natural {110} or {202} faces while vaterite, which forms first in the absence of the proteins, is bypassed. These protein-stabilized nanocrystals then assemble by oriented attachment into calcite mesocrystals. We find further that nanocrystal size and polymorph can be tuned by varying the length and surface chemistry of the designed protein templates. Thus, bio-mineralization can be programmed using de novo protein design, providing a route to next-generation hybrid materials.
A systematic and robust approach to generating complex protein nanomaterials would have broad utility. We develop a hierarchical approach to designing multi-component protein assemblies from two ...classes of modular building blocks: designed helical repeat proteins (DHRs) and helical bundle oligomers (HBs). We first rigidly fuse DHRs to HBs to generate a large library of oligomeric building blocks. We then generate assemblies with cyclic, dihedral, and point group symmetries from these building blocks using architecture guided rigid helical fusion with new software named WORMS. X-ray crystallography and cryo-electron microscopy characterization show that the hierarchical design approach can accurately generate a wide range of assemblies, including a 43 nm diameter icosahedral nanocage. The computational methods and building block sets described here provide a very general route to de novo designed protein nanomaterials.
Transmembrane β-barrel proteins (TMBs) are of great interest for single-molecule analytical technologies because they can spontaneously fold and insert into membranes and form stable pores, but the ...range of pore properties that can be achieved by repurposing natural TMBs is limited. We leverage the power of de novo computational design coupled with a "hypothesis, design, and test" approach to determine TMB design principles, notably, the importance of negative design to slow β-sheet assembly. We design new eight-stranded TMBs, with no homology to known TMBs, that insert and fold reversibly into synthetic lipid membranes and have nuclear magnetic resonance and x-ray crystal structures very similar to the computational models. These advances should enable the custom design of pores for a wide range of applications.