Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing ...database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "inter-paralog inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.
Dan Salah Tawfik (1955–2021) Laurino, Paola; Tokuriki, Nobuhiko
Nature chemical biology,
09/2021, Letnik:
17, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Dan Tawfik suddenly left us on 4 May, 2021. His scientific intuition led him to articulate, and solve, many key questions related to protein chemistry and molecular evolution. Although science, ...particularly for his students, postdoctoral fellows and colleagues, is dimmer after his loss, his legacy will persist.
Loops are small secondary structural elements that play a crucial role in the emergence of new enzyme functions. However, the evolutionary molecular mechanisms how proteins acquire these loop ...elements and obtain new function is poorly understood. To address this question, we study glycoside hydrolase family 19 (GH19) chitinase-an essential enzyme family for pathogen degradation in plants. By revealing the evolutionary history and loops appearance of GH19 chitinase, we discover that one loop which is remote from the catalytic site, is necessary to acquire the new antifungal activity. We demonstrate that this remote loop directly accesses the fungal cell wall, and surprisingly, it needs to adopt a defined structure supported by long-range intramolecular interactions to perform its function. Our findings prove that nature applies this strategy at the molecular level to achieve a complex biological function while maintaining the original activity in the catalytic pocket, suggesting an alternative way to design new enzyme function.
The rapid growth of sequence databases over the past two decades means that protein engineers faced with optimizing a protein for any given task will often have immediate access to a vast number of ...related protein sequences. These sequences encode information about the evolutionary history of the protein and the underlying sequence requirements to produce folded, stable, and functional protein variants. Methods that can take advantage of this information are an increasingly important part of the protein engineering tool kit. In this Perspective, we discuss the utility of sequence data in protein engineering and design, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the use of sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the use of unlabeled sequence data for unsupervised and semisupervised machine learning, allowing the generation of diverse and functional protein sequences in unexplored regions of sequence space. Altogether, these methods enable the rapid exploration of sequence space within regions enriched with functional proteins and therefore have great potential for accelerating the engineering of stable, functional, and diverse proteins for industrial and biomedical applications.
Methyltransferases (MTases) are superfamilies of enzymes that catalyze the transfer of a methyl group from S-adenosylmethionine (SAM), a nucleoside-based cofactor, to a wide variety of substrates ...such as DNA, RNA, proteins, small molecules, and lipids. Depending upon their structural features, the MTases can be further classified into different classes; we consider exclusively the largest class of MTases, the Rossmann-fold MTases. It has been shown that the nucleoside cofactor-binding Rossmann enzymes, particularly the nicotinamide adenine dinucleotide (NAD)-, flavin adenine dinucleotide (FAD)-, and SAM-binding MTases enzymes, share common binding motifs that include a Gly-rich loop region that interacts with the cofactor and a highly conserved acidic residue (Asp/Glu) that interacts with the ribose moiety of the cofactor. Here, we observe that the Gly-rich loop region of the Rossmann MTases adapts a specific type II′ β-turn in the proximity of the cofactor (<4 Å), and it appears to be a key feature of these superfamilies. Additionally, we demonstrate that the conservation of this β-turn could play a critical role in the enzyme–cofactor interaction, thereby shedding new light on the structural conformation of the Gly-rich loop region from Rossmann MTases.
Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a ...typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymatic properties or in the active-site configuration as determined by crystallography. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of soluble and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
Display omitted
•A new computational method is used to stabilize five recalcitrant proteins•Designed variants show higher expression and stability with unmodified function•A designed human acetylcholinesterase variant expresses solubly in bacteria•The method is fully automated and implemented on a webserver
Heterologous expression of proteins and their mutants often results in misfolding and aggregation. Goldenzweig et al. (2016) developed an automated algorithm for protein stabilization requiring minimal experimental testing; for instance, the five tested variants of human acetylcholinesterase showed ≥100-fold higher soluble bacterial expression and higher melting temperatures than wild-type.
Nucleobase-containing coenzymes are hypothesized to be relics of an early RNA-based world that preceded the emergence of proteins. Despite the importance of coenzyme-protein synergisms, their ...emergence and evolution remain understudied. An excellent target to address this issue is the Rossmann fold, the most catalytically diverse and abundant protein architecture in nature. We investigated two main Rossmann lineages: the nicotinamide adenine dinucleotide phosphate (NAD(P)) and the S-adenosyl methionine (SAM)- binding superfamilies. To identify the evolutionary changes that lead to a coenzyme specificity switch on these superfamilies, we performed structural and sequence-based Hidden Markov model analysis to systematically search for key motifs in their coenzyme-binding pockets. Our analyses revealed that through insertions and deletions (InDels) and a residue substitution, the ancient β1-loop-α1 coenzyme-binding structure of NAD(P) could be reshaped into the SAM-binding β1-loop-α1 structure. To experimentally prove this obsevation, we removed three amino acids from the NAD(P)-binding pocket and solved the structure of the resulting mutant, revealing the characteristic loop features of the SAM-binding pocket. To confirm the binding to SAM, we performed isothermal titration calorimetry measurements. Molecular dynamics simulations also corroborated the role of InDels in abolishing NAD binding and acquiring SAM binding. Our results uncovered how nature may have utilized insertions and deletions to optimize the different coenzyme-binding pockets and the distinct functionalities observed for Rossmann superfamilies. This work also proposes a general mechanism by which protein templates could have been recycled through the course of evolution to adopt different coenzymes and confer distinct chemistries.
Mechanisms of protein evolution Jayaraman, Vijay; Toledo‐Patiño, Saacnicteh; Noda‐García, Lianet ...
Protein science,
July 2022, Letnik:
31, Številka:
7
Journal Article
Recenzirano
Odprti dostop
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to ...evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via “plasticity‐first” mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre‐LUCA enzymes.
The epidermal growth factor receptor (EGFR) is a membrane-anchored tyrosine kinase that is able to selectively respond to multiple extracellular stimuli. Previous studies have indicated that the ...modularity of this system may be caused by ligand-induced differences in the stability of the receptor dimer. However, this hypothesis has not been explored using single-mutant ligands thus far. Herein, we developed a new approach to identify residues responsible for functional divergence by selecting residues in the epidermal growth factor (EGF) ligand that are conserved among orthologs yet divergent between paralogs. Then, we mutated these residues and assessed the mutants' effects on the receptor using a combination of molecular dynamics (MD) and biochemical techniques. Although the EGF mutants had binding affinities for the EGFR comparable with the WT ligand, the EGF mutants showed differential patterns of receptor phosphorylation and cell growth in multiple cell lines. The MD simulations of the EGF mutants indicated that mutations had long-range effects on the receptor dimer interface. This study shows for the first time that a single mutation in the EGF is sufficient to alter the activation of the EGFR signaling pathway at the cellular level. These results also support that biased ligand–receptor signaling in the tyrosine kinase receptor system can lead to differential downstream outcomes and demonstrate a promising new method to study ligand–receptor interactions.
Abstract
The tRNA modification m1G37, introduced by the tRNA methyltransferase TrmD, is thought to be essential for growth in bacteria because it suppresses translational frameshift errors at proline ...codons. However, because bacteria can tolerate high levels of mistranslation, it is unclear why loss of m1G37 is not tolerated. Here, we addressed this question through experimental evolution of trmD mutant strains of Escherichia coli. Surprisingly, trmD mutant strains were viable even if the m1G37 modification was completely abolished, and showed rapid recovery of growth rate, mainly via duplication or mutation of the proline-tRNA ligase gene proS. Growth assays and in vitro aminoacylation assays showed that G37-unmodified tRNAPro is aminoacylated less efficiently than m1G37-modified tRNAPro, and that growth of trmD mutant strains can be largely restored by single mutations in proS that restore aminoacylation of G37-unmodified tRNAPro. These results show that inefficient aminoacylation of tRNAPro is the main reason for growth defects observed in trmD mutant strains and that proS may act as a gatekeeper of translational accuracy, preventing the use of error-prone unmodified tRNAPro in translation. Our work shows the utility of experimental evolution for uncovering the hidden functions of essential genes and has implications for the development of antibiotics targeting TrmD.