In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate ...predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.
Nano-flow liquid chromatography tandem mass spectrometry (nano-flow LC-MS/MS) is the mainstay in proteome research because of its excellent sensitivity but often comes at the expense of robustness. ...Here we show that micro-flow LC-MS/MS using a 1 × 150 mm column shows excellent reproducibility of chromatographic retention time (<0.3% coefficient of variation, CV) and protein quantification (<7.5% CV) using data from >2000 samples of human cell lines, tissues and body fluids. Deep proteome analysis identifies >9000 proteins and >120,000 peptides in 16 h and sample multiplexing using tandem mass tags increases throughput to 11 proteomes in 16 h. The system identifies >30,000 phosphopeptides in 12 h and protein-protein or protein-drug interaction experiments can be analyzed in 20 min per sample. We show that the same column can be used to analyze >7500 samples without apparent loss of performance. This study demonstrates that micro-flow LC-MS/MS is suitable for a broad range of proteomic applications.
Single-cell proteomics by mass spectrometry (SCoPE-MS) is a recently introduced method to quantify multiplexed single-cell proteomes. While this technique has generated great excitement, the ...underlying technologies (isobaric labeling and mass spectrometry (MS)) have technical limitations with the potential to affect data quality and biological interpretation. These limitations are particularly relevant when a carrier proteome, a sample added at 25-500× the amount of a single-cell proteome, is used to enable peptide identifications. Here we perform controlled experiments with increasing carrier proteome amounts and evaluate quantitative accuracy, as it relates to mass analyzer dynamic range, multiplexing level and number of ions sampled. We demonstrate that an increase in carrier proteome level requires a concomitant increase in the number of ions sampled to maintain quantitative accuracy. Lastly, we introduce Single-Cell Proteomics Companion (SCPCompanion), a software tool that enables rapid evaluation of single-cell proteomic data and recommends instrument and data analysis parameters for improved data quality.
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. ...Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Mass-spectrometry-based proteomics is continuing to make major contributions to the discovery of fundamental biological processes and, more recently, has also developed into an assay platform capable ...of measuring hundreds to thousands of proteins in any biological system. The field has progressed at an amazing rate over the past five years in terms of technology as well as the breadth and depth of applications in all areas of the life sciences. Some of the technical approaches that were at an experimental stage back then are considered the gold standard today, and the community is learning to come to grips with the volume and complexity of the data generated. The revolution in DNA/RNA sequencing technology extends the reach of proteomic research to practically any species, and the notion that mass spectrometry has the potential to eventually retire the western blot is no longer in the realm of science fiction. In this review, we focus on the major technical and conceptual developments since 2007 and illustrate these by important recent applications.
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Characterizing the human leukocyte antigen (HLA) bound ligandome by mass spectrometry (MS) holds great promise for developing vaccines and drugs for immune-oncology. Still, the identification of ...non-tryptic peptides presents substantial computational challenges. To address these, we synthesized and analyzed >300,000 peptides by multi-modal LC-MS/MS within the ProteomeTools project representing HLA class I & II ligands and products of the proteases AspN and LysN. The resulting data enabled training of a single model using the deep learning framework Prosit, allowing the accurate prediction of fragment ion spectra for tryptic and non-tryptic peptides. Applying Prosit demonstrates that the identification of HLA peptides can be improved up to 7-fold, that 87% of the proposed proteasomally spliced HLA peptides may be incorrect and that dozens of additional immunogenic neo-epitopes can be identified from patient tumors in published data. Together, the provided peptides, spectra and computational tools substantially expand the analytical depth of immunopeptidomics workflows.
Proteome-wide measurements of protein turnover have largely ignored the impact of post-translational modifications (PTMs). To address this gap, we employ stable isotope labeling and mass spectrometry ...to measure the turnover of >120,000 peptidoforms including >33,000 phosphorylated, acetylated, and ubiquitinated peptides for >9,000 native proteins. This site-resolved protein turnover (SPOT) profiling discloses global and site-specific differences in turnover associated with the presence or absence of PTMs. While causal relationships may not always be immediately apparent, we speculate that PTMs with diverging turnover may distinguish states of differential protein stability, structure, localization, enzymatic activity, or protein-protein interactions. We show examples of how the turnover data may give insights into unknown functions of PTMs and provide a freely accessible online tool that allows interrogation and visualisation of all turnover data. The SPOT methodology is applicable to many cell types and modifications, offering the potential to prioritize PTMs for future functional investigations.
Isobaric labeling using tandem mass tags (TMTs) is increasingly applied for deep-scale proteomic studies in a multitude of organisms and addressing diverse research questions. The cost of labeling ...reagents represents a substantial proportion of the total expenses for conducting such experiments. Here, Zecha et al. present an economically optimized TMT labeling approach that reduces the quantity of required labeling reagent by a factor of eight and reproducibly achieves complete labeling.
Display omitted
Highlights
•TMT labeling protocol with excellent intra- and interlaboratory reproducibility.•Complete in-solution labeling of peptides using 1/8 of recommended TMT quantities.•Demonstration of utility for deep-scale (phospho)proteome analysis.
Isobaric stable isotope labeling using, for example, tandem mass tags (TMTs) is increasingly being applied for large-scale proteomic studies. Experiments focusing on proteoform analysis in drug time course or perturbation studies or in large patient cohorts greatly benefit from the reproducible quantification of single peptides across samples. However, such studies often require labeling of hundreds of micrograms of peptides such that the cost for labeling reagents represents a major contribution to the overall cost of an experiment. Here, we describe and evaluate a robust and cost-effective protocol for TMT labeling that reduces the quantity of required labeling reagent by a factor of eight and achieves complete labeling. Under- and overlabeling of peptides derived from complex digests of tissues and cell lines were systematically evaluated using peptide quantities of between 12.5 and 800 μg and TMT-to-peptide ratios (wt/wt) ranging from 8:1 to 1:2 at different TMT and peptide concentrations. When reaction volumes were reduced to maintain TMT and peptide concentrations of at least 10 mm and 2 g/l, respectively, TMT-to-peptide ratios as low as 1:1 (wt/wt) resulted in labeling efficiencies of > 99% and excellent intra- and interlaboratory reproducibility. The utility of the optimized protocol was further demonstrated in a deep-scale proteome and phosphoproteome analysis of patient-derived xenograft tumor tissue benchmarked against the labeling procedure recommended by the TMT vendor. Finally, we discuss the impact of labeling reaction parameters for N-hydroxysuccinimide ester-based chemistry and provide guidance on adopting efficient labeling protocols for different peptide quantities.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The quantification of differences between two or more physiological states of a biological system is among the most important but also most challenging technical tasks in proteomics. In addition to ...the classical methods of differential protein gel or blot staining by dyes and fluorophores, mass-spectrometry-based quantification methods have gained increasing popularity over the past five years. Most of these methods employ differential stable isotope labeling to create a specific mass tag that can be recognized by a mass spectrometer and at the same time provide the basis for quantification. These mass tags can be introduced into proteins or peptides (i) metabolically, (ii) by chemical means, (iii) enzymatically, or (iv) provided by spiked synthetic peptide standards. In contrast, label-free quantification approaches aim to correlate the mass spectrometric signal of intact proteolytic peptides or the number of peptide sequencing events with the relative or absolute protein quantity directly. In this review, we critically examine the more commonly used quantitative mass spectrometry methods for their individual merits and discuss challenges in arriving at meaningful interpretations of quantitative proteomic data. graphic removed
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Abstract
The phosphatases PP1 and PP2A are responsible for the majority of dephosphorylation reactions on phosphoserine (pSer) and phosphothreonine (pThr), and are involved in virtually all cellular ...processes and numerous diseases. The catalytic subunits exist in cells in form of holoenzymes, which impart substrate specificity. The contribution of the catalytic subunits to the recognition of substrates is unclear. By developing a phosphopeptide library approach and a phosphoproteomic assay, we demonstrate that the specificity of PP1 and PP2A holoenzymes towards pThr and of PP1 for basic motifs adjacent to the phosphorylation site are due to intrinsic properties of the catalytic subunits. Thus, we dissect this amino acid specificity of the catalytic subunits from the contribution of regulatory proteins. Furthermore, our approach enables discovering a role for PP1 as regulator of the GRB-associated-binding protein 2 (GAB2)/14-3-3 complex. Beyond this, we expect that this approach is broadly applicable to detect enzyme-substrate recognition preferences.