Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. ...Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
Genome‐, transcriptome‐ and proteome‐wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are ...expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein‐level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue‐specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.
Synopsis
Proteome and transcriptome quantification across tissues reveals which human genes exist as transcripts and proteins, where they are expressed and in which approximate quantities. Tissue‐specific protein expression is found to be a rare and quantitative rather than qualitative characteristic.
The study presents the most comprehensive atlas of protein expression to date, across 29 healthy human tissues.
Protein level evidence is provided for 13,640 genes and 15,257 isoforms, including 37 missing proteins.
Tissue‐specific protein expression is rare and quantitative rather than qualitative characteristic.
Proteogenomics is still challenging and needs rigorous validation by synthetic peptides.
Proteome and transcriptome quantification across tissues reveals which human genes exist as transcripts and proteins, where they are expressed and in which approximate quantities. Tissue‐specific protein expression is found to be a rare and quantitative rather than qualitative characteristic.
Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information ...that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.
We have used a mass spectrometry-based proteomic approach to compile an atlas of the thermal stability of 48,000 proteins across 13 species ranging from archaea to humans and covering melting ...temperatures of 30-90 °C. Protein sequence, composition and size affect thermal stability in prokaryotes and eukaryotic proteins show a nonlinear relationship between the degree of disordered protein structure and thermal stability. The data indicate that evolutionary conservation of protein complexes is reflected by similar thermal stability of their proteins, and we show examples in which genomic alterations can affect thermal stability. Proteins of the respiratory chain were found to be very stable in many organisms, and human mitochondria showed close to normal respiration at 46 °C. We also noted cell-type-specific effects that can affect protein stability or the efficacy of drugs. This meltome atlas broadly defines the proteome amenable to thermal profiling in biology and drug discovery and can be explored online at http://meltomeatlas.proteomics.wzw.tum.de:5003/ and http://www.proteomicsdb.org.
The intestinal microbiota is known to regulate host energy homeostasis and can be influenced by high-calorie diets. However, changes affecting the ecosystem at the functional level are still not well ...characterized. We measured shifts in cecal bacterial communities in mice fed a carbohydrate or high-fat (HF) diet for 12 weeks at the level of the following: (i) diversity and taxa distribution by high-throughput 16S ribosomal RNA gene sequencing; (ii) bulk and single-cell chemical composition by Fourier-transform infrared- (FT-IR) and Raman micro-spectroscopy and (iii) metaproteome and metabolome via high-resolution mass spectrometry. High-fat diet caused shifts in the diversity of dominant gut bacteria and altered the proportion of Ruminococcaceae (decrease) and Rikenellaceae (increase). FT-IR spectroscopy revealed that the impact of the diet on cecal chemical fingerprints is greater than the impact of microbiota composition. Diet-driven changes in biochemical fingerprints of members of the Bacteroidales and Lachnospiraceae were also observed at the level of single cells, indicating that there were distinct differences in cellular composition of dominant phylotypes under different diets. Metaproteome and metabolome analyses based on the occurrence of 1760 bacterial proteins and 86 annotated metabolites revealed distinct HF diet-specific profiles. Alteration of hormonal and anti-microbial networks, bile acid and bilirubin metabolism and shifts towards amino acid and simple sugars metabolism were observed. We conclude that a HF diet markedly affects the gut bacterial ecosystem at the functional level.
The NCI-60 cell line collection is a very widely used panel for the study of cellular mechanisms of cancer in general and in vitro drug action in particular. It is a model system for the tissue types ...and genetic diversity of human cancers and has been extensively molecularly characterized. Here, we present a quantitative proteome and kinome profile of the NCI-60 panel covering, in total, 10,350 proteins (including 375 protein kinases) and including a core cancer proteome of 5,578 proteins that were consistently quantified across all tissue types. Bioinformatic analysis revealed strong cell line clusters according to tissue type and disclosed hundreds of differentially regulated proteins representing potential biomarkers for numerous tumor properties. Integration with public transcriptome data showed considerable similarity between mRNA and protein expression. Modeling of proteome and drug-response profiles for 108 FDA-approved drugs identified known and potential protein markers for drug sensitivity and resistance. To enable community access to this unique resource, we incorporated it into a public database for comparative and integrative analysis (http://wzw.tum.de/proteomics/nci60).
Display omitted
•Broad survey of protein and kinase expression in the NCI-60 cell line panel•Proteomic analysis clusters cell lines according to tumor type•The correlation between proteomic and transcriptomic data is examined•Proteomics is particularly powerful for identifying drug-resistance mechanisms
Kuster, Gholami, and colleagues present a global survey of protein and kinase expression in the NCI-60 panel. Their bioinformatics analysis reveals strong cell line clustering according to tumor type. Integrative analysis discloses a high degree of correlation between the transcriptome and proteome and shows that each technique provides complementary information. In particular, proteomics appears to be powerful for identifying drug-resistance mechanisms. These data are available to the community for broad utilization in biological research.
Citrullination is a posttranslational modification of arginine catalyzed by five peptidylarginine deiminases (PADs) in humans. The loss of a positive charge may cause structural or functional ...alterations, and while the modification has been linked to several diseases, including rheumatoid arthritis (RA) and cancer, its physiological or pathophysiological roles remain largely unclear. In part, this is owing to limitations in available methodology to robustly enrich, detect, and localize the modification. As a result, only a few citrullination sites have been identified on human proteins with high confidence. In this study, we mined data from mass-spectrometry-based deep proteomic profiling of 30 human tissues to identify citrullination sites on endogenous proteins. Database searching of ∼70 million tandem mass spectra yielded ∼13,000 candidate spectra, which were further triaged by spectrum quality metrics and the detection of the specific neutral loss of isocyanic acid from citrullinated peptides to reduce false positives. Because citrullination is easily confused with deamidation, we synthetized ∼2,200 citrullinated and 1,300 deamidated peptides to build a library of reference spectra. This led to the validation of 375 citrullination sites on 209 human proteins. Further analysis showed that >80% of the identified modifications sites were new, and for 56% of the proteins, citrullination was detected for the first time. Sequence motif analysis revealed a strong preference for Asp and Gly, residues around the citrullination site. Interestingly, while the modification was detected in 26 human tissues with the highest levels found in the brain and lung, citrullination levels did not correlate well with protein expression of the PAD enzymes. Even though the current work represents the largest survey of protein citrullination to date, the modification was mostly detected on high abundant proteins, arguing that the development of specific enrichment methods would be required in order to study the full extent of cellular protein citrullination.
The attachment of N-acetylglucosamine to serine or threonine residues (O-GlcNAc) is a post-translational modification on nuclear and cytoplasmic proteins with emerging roles in numerous cellular ...processes, such as signal transduction, transcription, and translation. It is further presumed that O-GlcNAc can exhibit a site-specific, dynamic and possibly functional interplay with phosphorylation. O-GlcNAc proteins are commonly identified by tandem mass spectrometry following some form of biochemical enrichment. In the present study, we assessed if, and to which extent, O-GlcNAc-modified proteins can be discovered from existing large-scale proteome data sets. To this end, we conceived a straightforward O-GlcNAc identification strategy based on our recently developed Oscore software that automatically analyzes tandem mass spectra for the presence and intensity of O-GlcNAc diagnostic fragment ions. Using the Oscore, we discovered hundreds of O-GlcNAc peptides not initially identified in these studies, and most of which have not been described before. Merely re-searching this data extended the number of known O-GlcNAc proteins by almost 100 suggesting that this modification exists even more widely than previously anticipated and the modification is often sufficiently abundant to be detected without enrichment. However, a comparison of O-GlcNAc and phospho-identifications from the very same data indicates that the O-GlcNAc modification is considerably less abundant than phosphorylation. The discovery of numerous doubly modified peptides (i.e. peptides with one or multiple O-GlcNAc or phosphate moieties), suggests that O-GlcNAc and phosphorylation are not necessarily mutually exclusive, but can occur simultaneously at adjacent sites.
Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein‐to‐mRNA (PTR) ratios and a quantification of their effects are still ...lacking. Here, we quantified PTR ratios for 11,575 proteins across 29 human tissues using matched transcriptomes and proteomes. We estimated by regression the contribution of known sequence determinants of protein synthesis and degradation in addition to 45 mRNA and 3 protein sequence motifs that we found by association testing. While PTR ratios span more than 2 orders of magnitude, our integrative model predicts PTR ratios at a median precision of 3.2‐fold. A reporter assay provided functional support for two novel UTR motifs, and an immobilized mRNA affinity competition‐binding assay identified motif‐specific bound proteins for one motif. Moreover, our integrative model led to a new metric of codon optimality that captures the effects of codon frequency on protein synthesis and degradation. Altogether, this study shows that a large fraction of PTR ratio variation in human tissues can be predicted from sequence, and it identifies many new candidate post‐transcriptional regulatory elements.
Synopsis
Protein‐to‐mRNA (PTR) ratios are quantified across 29 human tissues using matched transcriptomes and proteomes. Sequence‐based predictions of tissue‐specific PTR ratios reveal novel post‐transcriptional regulatory elements and yield a new metrics of codon optimality.
A sequence‐based model predicts protein‐to‐mRNA ratios for 29 human tissues at a median precision across genes of 3.2‐fold.
Reporter assays provide functional support for two novel UTR motifs and a proteome‐wide competition‐binding assay identifies motif‐specific bound proteins for one motif.
Protein‐to‐mRNA adaptation index (PTR‐AI), a new metrics of codon optimality, captures the effects of codon frequency on protein synthesis and degradation.
Protein‐to‐mRNA (PTR) ratios are quantified across 29 human tissues using matched transcriptomes and proteomes. Sequence‐based predictions of tissue‐specific PTR ratios reveal novel post‐transcriptional regulatory elements and yield a new metrics of codon optimality.
Background & Aims Matrix metalloproteases (MMPs) mediate pathogenesis of chronic intestinal inflammation. We characterized the role of the gelatinase (GelE), a metalloprotease from Enterococcus ...faecalis , in the development of colitis in mice. Methods Germ-free, interleukin-10–deficient (IL-10−/− ) mice were monoassociated with the colitogenic E faecalis strain OG1RF and isogenic, GelE-mutant strains. Barrier function was determined by measuring E-cadherin expression, transepithelial electrical resistance (TER), and translocation of permeability markers in colonic epithelial cells and colon segments from IL-10−/− and TNFΔARE/Wt mice. GelE specificity was shown with the MMP inhibitor marimastat. Results Histologic analysis (score 0–4) of E faecalis monoassociated IL-10−/− mice revealed a significant reduction in colonic tissue inflammation in the absence of bacteria-derived GelE. We identified cleavage sites for GelE in the sequence of recombinant mouse E-cadherin, indicating that it might be degraded by GelE. Experiments with Ussing chambers and purified GelE revealed the loss of barrier function and extracellular E-cadherin in mice susceptible to intestinal inflammation (IL-10−/− and TNFΔARE/Wt mice) before inflammation developed. Colonic epithelial cells had reduced TER and increased translocation of permeability markers after stimulation with GelE from OG1RF or strains of E faecalis isolated from patients with Crohn's disease and ulcerative colitis. Conclusions The metalloprotease GelE, produced by commensal strains of E faecalis, contributes to development of chronic intestinal inflammation in mice that are susceptible to intestinal inflammation (IL-10−/− and TNFΔARE/Wt mice) by impairing epithelial barrier integrity.