Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information ...facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
The WAVE regulatory complex (WRC) controls actin cytoskeletal dynamics throughout the cell by stimulating the actin-nucleating activity of the Arp2/3 complex at distinct membrane sites. However, the ...factors that recruit the WRC to specific locations remain poorly understood. Here, we have identified a large family of potential WRC ligands, consisting of ∼120 diverse membrane proteins, including protocadherins, ROBOs, netrin receptors, neuroligins, GPCRs, and channels. Structural, biochemical, and cellular studies reveal that a sequence motif that defines these ligands binds to a highly conserved interaction surface of the WRC formed by the Sra and Abi subunits. Mutating this binding surface in flies resulted in defects in actin cytoskeletal organization and egg morphology during oogenesis, leading to female sterility. Our findings directly link diverse membrane proteins to the WRC and actin cytoskeleton and have broad physiological and pathological ramifications in metazoans.
Display omitted
•Many potential WRC ligands defined by a peptide motif (WIRS) were identified•Motif binds to a conserved WRC surface formed by Sra and Abi subunits•WIRS/WRC interaction regulates oogenesis in flies
A short peptide motif that binds to a conserved surface of the WAVE regulatory complex (WRC) has been identified in a large family of diverse membrane proteins. This interaction recruits the WRC to membranes, regulates the actin cytoskeleton, and is important during Drosophila development.
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress ...in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
Evolutionary Classification Of protein Domains (ECOD) (http://prodata.swmed.edu/ecod) comprehensively classifies protein with known spatial structures maintained by the Protein Data Bank (PDB) into ...evolutionary groups of protein domains. ECOD relies on a combination of automatic and manual weekly updates to achieve its high accuracy and coverage with a short update cycle. ECOD classifies the approximately 120 000 depositions of the PDB into more than 500 000 domains in ∼3400 homologous groups. We show the performance of the weekly update pipeline since the release of ECOD, describe improvements to the ECOD website and available search options, and discuss novel structures and homologous groups that have been classified in the recent updates. Finally, we discuss the future directions of ECOD and further improvements planned for the hierarchy and update process.
Comprehensive characterization of tumor antigens is essential for the design of cancer immunotherapies, and mass spectrometry (MS)-based immunopeptidomics enables high-throughput identification of ...major histocompatibility complex (MHC)-bound peptide antigens in vivo. Here we construct an immunopeptidome atlas of human cancer through an extensive collection of 43 published immunopeptidomic datasets and standardized analysis of 81.6 million MS/MS spectra using an open search engine. Our analysis greatly expands the current knowledge of MHC-bound antigens, including an unprecedented characterization of post-translationally modified antigens and their cancer-association. We also perform systematic analysis of cancer-testis antigens, cancer-associated antigens, and neoantigens. We make all these data together with annotated MS/MS spectra supporting identification of each antigen in an easily browsable web portal named cancer antigen atlas (caAtlas). caAtlas provides a central resource for the selection and prioritization of MHC-bound peptides for in vitro HLA binding assay and immunogenicity testing, which will pave the way to eventual development of cancer immunotherapies.
Display omitted
•Extensive collection of 43 immunopeptidomic datasets with 1018 samples•Standardized and rigorous identification of HLA-bound peptides, including PTM peptides•Comprehensive annotation of CT antigens and cancer-associated antigens•User-friendly data dissemination through the caAtlas web portal
Immunology; Proteomics; Cancer
A large family of G protein-coupled receptors (GPCRs) involved in cell adhesion has a characteristic autoproteolysis motif of HLT/S known as the GPCR proteolysis site (GPS). GPS is also shared by ...polycystic kidney disease proteins and it precedes the first transmembrane segment in both families. Recent structural studies have elucidated the GPS to be part of a larger domain named GPCR autoproteolysis inducing (GAIN) domain. Here we demonstrate the remote homology relationships of GAIN domain to ZU5 domain and Nucleoporin98 (Nup98) C-terminal domain by structural and sequence analysis. Sequence homology searches were performed to extend ZU5-like domains to bacteria and archaea, as well as new eukaryotic families. We found that the consecutive ZU5-UPA-death domain domain organization is commonly used in human cytoplasmic proteins with ZU5 domains, including CARD8 (caspase recruitment domain-containing protein 8) and NLRP1 (NACHT, LRR and PYD domain-containing protein 1) from the FIIND (Function to Find) family. Another divergent family of extracellular ZU5-like domains was identified in cartilage intermediate layer proteins and FAM171 proteins. Current diverse families of GAIN domain subdomain B, ZU5 and Nup98 C-terminal domain likely evolved from an ancient autoproteolytic domain with an HFS motif. The autoproteolytic site was kept intact in Nup98, p53-induced protein with a death domain and UNC5C-like, deteriorated in many ZU5 domains and changed in GAIN and FIIND. Deletion of the strand after the cleavage site was observed in zonula occluden-1 and some Nup98 homologs. These findings link several autoproteolytic domains, extend our understanding of GAIN domain origination in adhesion GPCRs and provide insights into the evolution of an ancient autoproteolytic domain.
Display omitted
•Adhesion GPCRs have a characteristic GAIN domain with an autoproteolytic motif.•GAIN domain subdomain B, ZU5 and Nup98 C-terminal domain are homologous.•Bacterial and archaeal homologs, as well as new eukaryotic families, are identified.•Changes of localization, autoproteolytic motif and fold happened in evolution.
WebGestalt is a popular tool for the interpretation of gene lists derived from large scale -omics studies. In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 ...functional categories, as well as user-uploaded functional databases. To address the growing and unique need for phosphoproteomics data interpretation, we have implemented phosphosite set analysis to identify important kinases from phosphoproteomics data. We have completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures. To facilitate comprehension of the enrichment results, we have implemented two methods to reduce redundancy between enriched gene sets. We introduced a web API for other applications to get data programmatically from the WebGestalt server or pass data to WebGestalt for analysis. We also wrapped the core computation into an R package called WebGestaltR for users to perform analysis locally or in third party workflows. WebGestalt can be freely accessed at http://www.webgestalt.org.
Co-clinical trials are the concurrent or sequential evaluation of therapeutics in both patients clinically and patient-derived xenografts (PDX) pre-clinically, in a manner designed to match the ...pharmacokinetics and pharmacodynamics of the agent(s) used. The primary goal is to determine the degree to which PDX cohort responses recapitulate patient cohort responses at the phenotypic and molecular levels, such that pre-clinical and clinical trials can inform one another. A major issue is how to manage, integrate, and analyze the abundance of data generated across both spatial and temporal scales, as well as across species. To address this issue, we are developing MIRACCL (molecular and imaging response analysis of co-clinical trials), a web-based analytical tool. For prototyping, we simulated data for a co-clinical trial in "triple-negative" breast cancer (TNBC) by pairing pre- (T0) and on-treatment (T1) magnetic resonance imaging (MRI) from the I-SPY2 trial, as well as PDX-based T0 and T1 MRI. Baseline (T0) and on-treatment (T1) RNA expression data were also simulated for TNBC and PDX. Image features derived from both datasets were cross-referenced to omic data to evaluate MIRACCL functionality for correlating and displaying MRI-based changes in tumor size, vascularity, and cellularity with changes in mRNA expression as a function of treatment.
Deep Learning in Proteomics Wen, Bo; Zeng, Wen‐Feng; Liao, Yuxing ...
Proteomics (Weinheim),
November 2020, Volume:
20, Issue:
21-22
Journal Article
Peer reviewed
Open access
Proteomics, the study of all the proteins in biological systems, is becoming a data‐rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent ...advancements in tandem mass spectrometry (MS) technology, protein expression and post‐translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of ion from data, and it thrives in data‐rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex‐peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.