Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information ...facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
RIG-I and MDA5 detect viral RNA in the cytoplasm and activate signaling cascades leading to the production of type-I interferons. RIG-I is activated through sequential binding of viral RNA and ...unanchored lysine-63 (K63) polyubiquitin chains, but how polyubiquitin activates RIG-I and whether MDA5 is activated through a similar mechanism remain unresolved. Here, we showed that the CARD domains of MDA5 bound to K63 polyubiquitin and that this binding was essential for MDA5 to activate the transcription factor IRF3. Mutations of conserved residues in MDA5 and RIG-I that disrupt their ubiquitin binding also abrogated their ability to activate IRF3. Polyubiquitin binding induced the formation of a large complex consisting of four RIG-I and four ubiquitin chains. This hetero-tetrameric complex was highly potent in activating the antiviral signaling cascades. These results suggest a unified mechanism of RIG-I and MDA5 activation and reveal a unique mechanism by which ubiquitin regulates cell signaling and immune response.
Display omitted
► Like RIG-I, MDA5 activates IRF3 in a cell-free system ► Both RIG-I and MDA5 CARD domains bind K63 polyubiquitin chains and activate IRF3 ► Polyubiquitin binding is required for the activation of RIG-I and MDA5 ► Polyubiquitin binding induces the formation of a highly active RIG-I tetramer
Rossmann folds are ancient, frequently diverged domains found in many biological reaction pathways where they have adapted for different functions. Consequently, discernment and classification of ...their homologous relations and function can be complicated. We define a minimal Rossmann-like structure motif (RLM) that corresponds for the common core of known Rossmann domains and use this motif to identify all RLM domains in the Protein Data Bank (PDB), thus finding they constitute about 20% of all known 3D structures. The Evolutionary Classification of protein structure Domains (ECOD) classifies RLM domains in a number of groups that lack evidence for homology (X-groups), which suggests that they could have evolved independently multiple times. Closely related, homologous RLM enzyme families can diverge to bind different ligands using similar binding sites and to catalyze different reactions. Conversely, non-homologous RLM domains can converge to catalyze the same reactions or to bind the same ligand with alternate binding modes. We discuss a special case of such convergent evolution that is relevant to the polypharmacology paradigm, wherein the same drug (methotrexate) binds to multiple non-homologous RLM drug targets with different topologies. Finally, assigning proteins with RLM domain to the Enzyme Commission classification suggest that RLM enzymes function mainly in metabolism (and comprise 38% of reference metabolic pathways) and are overrepresented in extant pathways that represent ancient biosynthetic routes such as nucleotide metabolism, energy metabolism, and metabolism of amino acids. In fact, RLM enzymes take part in five out of eight enzymatic reactions of the Wood-Ljungdahl metabolic pathway thought to be used by the last universal common ancestor (LUCA). The prevalence of RLM domains in this ancient metabolism might explain their wide distribution among enzymes.
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas ...and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
De novo formation of the double-membrane compartment autophagosome is seeded by small vesicles carrying membrane protein autophagy-related 9 (ATG9), the function of which remains unknown. Here we ...find that ATG9A scrambles phospholipids of membranes in vitro. Cryo-EM structures of human ATG9A reveal a trimer with a solvated central pore, which is connected laterally to the cytosol through the cavity within each protomer. Similarities to ABC exporters suggest that ATG9A could be a transporter that uses the central pore to function. Moreover, molecular dynamics simulation suggests that the central pore opens laterally to accommodate lipid headgroups, thereby enabling lipids to flip. Mutations in the pore reduce scrambling activity and yield markedly smaller autophagosomes, indicating that lipid scrambling by ATG9A is essential for membrane expansion. We propose ATG9A acts as a membrane-embedded funnel to facilitate lipid flipping and to redistribute lipids added to the outer leaflet of ATG9 vesicles, thereby enabling growth into autophagosomes.
The existence of extracellular phosphoproteins has been acknowledged for over a century. However, research in this area has been undeveloped largely because the kinases that phosphorylate secreted ...proteins have escaped identification. Fam20C is a kinase that phosphorylates S-x-E/pS motifs on proteins in milk and in the extracellular matrix of bones and teeth. Here, we show that Fam20C generates the majority of the extracellular phosphoproteome. Using CRISPR/Cas9 genome editing, mass spectrometry, and biochemistry, we identify more than 100 secreted phosphoproteins as genuine Fam20C substrates. Further, we show that Fam20C exhibits broader substrate specificity than previously appreciated. Functional annotations of Fam20C substrates suggest roles for the kinase beyond biomineralization, including lipid homeostasis, wound healing, and cell migration and adhesion. Our results establish Fam20C as the major secretory pathway protein kinase and serve as a foundation for new areas of investigation into the role of secreted protein phosphorylation in human biology and disease.
Display omitted
•Fam20C is unique among the known secretory pathway kinases•Fam20C generates the majority of the secreted phosphoproteome•Fam20C substrates are implicated in a broad spectrum of biological processes•Fam20C is crucial for proper adhesion, migration, and invasion of breast cancer cells
The kinases that catalyze the phosphorylation of secreted proteins have only recently been identified, with Fam20C being identified as the kinase responsible for generating the vast majority of the secreted phosphoproteome, including substrates thought to drive tumor cell migration.
A propeptide is removed from a precursor protein to generate its active or mature form. Propeptides play essential roles in protein folding, transportation, and activation and are present in about ...2.3% of reviewed proteins in the UniProt database. They are often found in secreted or membrane-bound proteins including proteolytic enzymes, hormones, and toxins. We identified a variety of globular and nonglobular Pfam domains in protein sequences designated as propeptides, some of which form intramolecular interactions with other domains in the mature proteins. Propeptide-containing enzymes mostly function as proteases, as they are depleted in other enzyme classes such as hydrolases acting on DNA and RNA, isomerases, and lyases. We applied AlphaFold to generate structural models for over 7000 proteins with propeptides having no less than 20 residues. Analysis of residue contacts in these models revealed conformational changes for over 300 proteins before and after the cleavage of the propeptide. Examples of conformation change occur in several classes of proteolytic enzymes in the families of subtilisins, trypsins, aspartyl proteases, and thermolysin-like metalloproteases. In most of the observed cases, cleavage of the propeptide releases the constraints imposed by the covalent bond between the propeptide and the mature protein, and cleavage enables stronger interactions between the propeptide and the mature protein. These findings suggest that post-cleavage propeptides could play critical roles in regulating the activity of mature proteins.
The molecular mechanism of autophagy and its relationship to other lysosomal degradation pathways remain incompletely understood. Here, we identified a previously uncharacterized mammalian-specific ...protein, Beclin 2, which, like Beclin 1, functions in autophagy and interacts with class III PI3K complex components and Bcl-2. However, Beclin 2, but not Beclin 1, functions in an additional lysosomal degradation pathway. Beclin 2 is required for ligand-induced endolysosomal degradation of several G protein-coupled receptors (GPCRs) through its interaction with GASP1. Beclin 2 homozygous knockout mice have decreased embryonic viability, and heterozygous knockout mice have defective autophagy, increased levels of brain cannabinoid 1 receptor, elevated food intake, and obesity and insulin resistance. Our findings identify Beclin 2 as a converging regulator of autophagy and GPCR turnover and highlight the functional and mechanistic diversity of Beclin family members in autophagy, endolysosomal trafficking, and metabolism.
Display omitted
•beclin 2 is a newly described autophagy gene•Beclin 2 functions in endolysosomal degradation of GPCRs by binding to GASP1•The functions of Beclin 2 in autophagy and GPCR degradation are genetically distinct•Monoallelic loss of beclin 2 in mice results in metabolic dysregulation
A mammalian-specific autophagy protein, Beclin 2, has been characterized. Beclin 2 plays roles in autophagy and also has an autophagy-independent function in the lysosomal degradation of GPCRs, serving as a converging regulator of autophagy and GPCR turnover.
Approximately 10% of human protein kinases are believed to be inactive and named pseudokinases because they lack residues required for catalysis. Here, we show that the highly conserved pseudokinase ...selenoprotein-O (SelO) transfers AMP from ATP to Ser, Thr, and Tyr residues on protein substrates (AMPylation), uncovering a previously unrecognized activity for a member of the protein kinase superfamily. The crystal structure of a SelO homolog reveals a protein kinase-like fold with ATP flipped in the active site, thus providing a structural basis for catalysis. SelO pseudokinases localize to the mitochondria and AMPylate proteins involved in redox homeostasis. Consequently, SelO activity is necessary for the proper cellular response to oxidative stress. Our results suggest that AMPylation may be a more widespread post-translational modification than previously appreciated and that pseudokinases should be analyzed for alternative transferase activities.
Display omitted
•SelO adopts a protein kinase fold with ATP flipped in the active site•SelO transfers AMP to Ser, Thr, and Tyr residues on protein substrates (AMPylation)•SelO AMPylates proteins involved in redox homeostasis•SelO protects cells from oxidative stress and regulates protein glutathionylation
The structure of SelO, a conserved pseudokinase, reveals ATP flipped in the substrate binding pocket, leading to the discovery that SelO is actually an AMPylating enzyme.
The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and ...diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.