A main bottleneck in proteomics is the downstream biological analysis of highly multivariate quantitative protein abundance data generated using mass-spectrometry-based analysis. We developed the ...Perseus software platform (http://www.perseus-framework.org) to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data. Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing. A machine learning module supports the classification and validation of patient groups for diagnosis and prognosis, and it also detects predictive protein signatures. Central to Perseus is a user-friendly, interactive workflow environment that provides complete documentation of computational methods used in a publication. All activities in Perseus are realized as plugins, and users can extend the software by programming their own, which can be shared through a plugin store. We anticipate that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.
Protein quantification without isotopic labels has been a long-standing interest in the proteomics field. However, accurate and robust proteome-wide quantification with label-free approaches remains ...a challenge. We developed a new intensity determination and normalization procedure called MaxLFQ that is fully compatible with any peptide or protein separation prior to LC-MS analysis. Protein abundance profiles are assembled using the maximum possible information from MS signals, given that the presence of quantifiable peptides varies from sample to sample. For a benchmark dataset with two proteomes mixed at known ratios, we accurately detected the mixing ratio over the entire protein expression range, with greater precision for abundant proteins. The significance of individual label-free quantifications was obtained via a t test approach. For a second benchmark dataset, we accurately quantify fold changes over several orders of magnitude, a task that is challenging with label-based methods. MaxLFQ is a generic label-free quantification technology that is readily applicable to many biological questions; it is compatible with standard statistical analysis workflows, and it has been validated in many and diverse biological projects. Our algorithms can handle very large experiments of 500+ samples in a manageable computing time. It is implemented in the freely available MaxQuant computational proteomics platform and works completely seamlessly at the click of a button.
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine ...using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding ...members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
Absolute protein quantification using mass spectrometry (MS)-based proteomics delivers protein concentrations or copy numbers per cell. Existing methodologies typically require a combination of ...isotope-labeled spike-in references, cell counting, and protein concentration measurements. Here we present a novel method that delivers similar quantitative results directly from deep eukaryotic proteome datasets without any additional experimental steps. We show that the MS signal of histones can be used as a “proteomic ruler” because it is proportional to the amount of DNA in the sample, which in turn depends on the number of cells. As a result, our proteomic ruler approach adds an absolute scale to the MS readout and allows estimation of the copy numbers of individual proteins per cell. We compare our protein quantifications with values derived via the use of stable isotope labeling by amino acids in cell culture and protein epitope signature tags in a method that combines spike-in protein fragment standards with precise isotope label quantification. The proteomic ruler approach yields quantitative readouts that are in remarkably good agreement with results from the precision method. We attribute this surprising result to the fact that the proteomic ruler approach omits error-prone steps such as cell counting or protein concentration measurements. The proteomic ruler approach is readily applicable to any deep eukaryotic proteome dataset—even in retrospective analysis—and we demonstrate its usefulness with a series of mouse organ proteomes.
Subcellular localization critically influences protein function, and cells control protein localization to regulate biological processes. We have developed and applied Dynamic Organellar Maps, a ...proteomic method that allows global mapping of protein translocation events. We initially used maps statically to generate a database with localization and absolute copy number information for over 8700 proteins from HeLa cells, approaching comprehensive coverage. All major organelles were resolved, with exceptional prediction accuracy (estimated at >92%). Combining spatial and abundance information yielded an unprecedented quantitative view of HeLa cell anatomy and organellar composition, at the protein level. We subsequently demonstrated the dynamic capabilities of the approach by capturing translocation events following EGF stimulation, which we integrated into a quantitative model. Dynamic Organellar Maps enable the proteome-wide analysis of physiological protein movements, without requiring any reagents specific to the investigated process, and will thus be widely applicable in cell biology.
Skeletal muscle constitutes 40% of individual body mass and plays vital roles in locomotion and whole-body metabolism. Proteomics of skeletal muscle is challenging because of highly abundant ...contractile proteins that interfere with detection of regulatory proteins. Using a state-of-the art MS workflow and a strategy to map identifications from the C2C12 cell line model to tissues, we identified a total of 10,218 proteins, including skeletal muscle specific transcription factors like myod1 and myogenin and circadian clock proteins. We obtain absolute abundances for proteins expressed in a muscle cell line and skeletal muscle, which should serve as a valuable resource. Quantitation of protein isoforms of glucose uptake signaling pathways and in glucose and lipid metabolic pathways provides a detailed metabolic map of the cell line compared with tissue. This revealed unexpectedly complex regulation of AMP-activated protein kinase and insulin signaling in muscle tissue at the level of enzyme isoforms.
Extra chromosome copies markedly alter the physiology of eukaryotic cells, but the underlying reasons are not well understood. We created human trisomic and tetrasomic cell lines and determined the ...quantitative changes in their transcriptome and proteome in comparison with their diploid counterparts. We found that whereas transcription levels reflect the chromosome copy number changes, the abundance of some proteins, such as subunits of protein complexes and protein kinases, is reduced toward diploid levels. Furthermore, using the quantitative data we investigated the changes of cellular pathways in response to aneuploidy. This analysis revealed specific and uniform alterations in pathway regulation in cells with extra chromosomes. For example, the DNA and RNA metabolism pathways were downregulated, whereas several pathways such as energy metabolism, membrane metabolism and lysosomal pathways were upregulated. In particular, we found that the p62‐dependent selective autophagy is activated in the human trisomic and tetrasomic cells. Our data present the first broad proteomic analysis of human cells with abnormal karyotypes and suggest a uniform cellular response to the presence of an extra chromosome.
Genomic, transcriptomic and proteomic profiles of human aneuploid cells reveal that mRNA levels increase with gene copy number, but protein levels are partially compensated. Aneuploid cells also exhibit common alterations in several pathways, including an activation of autophagy.
Synopsis
Genomic, transcriptomic and proteomic profiles of human aneuploid cells reveal that mRNA levels increase with gene copy number, but protein levels are partially compensated. Aneuploid cells also exhibit common alterations in several pathways, including an activation of autophagy.
Comparative genomics, transcriptomics and proteomics of model human aneuploid cell lines reveal that whereas the mRNA levels increase proportionally to the chromosome copy numbers, the abundance of some proteins (e.g., subunits of complexes) is decreased to normal levels.
The pattern of up‐ and downregulated pathways was similar in all analyzed aneuploids, indicating that it might be possible to use aneuploidy as a cancer treatment target regardless of the exact chromosome composition of cancer cells.
Autophagy, in particular p62‐dependent selective autophagy, is activated in aneuploid human cell lines.