Targeted mass spectrometry (MS) is becoming widely used in academia and in pharmaceutical and biotechnology industries for sensitive and quantitative detection of proteins, peptides and ...post-translational modifications. Here we describe the increasing importance of targeted MS technologies in clinical proteomics and the potential key roles these techniques will have in bridging biomedical discovery and clinical implementation.
We report a mass spectrometry-based method for the integrated analysis of protein expression, phosphorylation, ubiquitination and acetylation by serial enrichments of different post-translational ...modifications (SEPTM) from the same biological sample. This technology enabled quantitative analysis of nearly 8,000 proteins and more than 20,000 phosphorylation, 15,000 ubiquitination and 3,000 acetylation sites per experiment, generating a holistic view of cellular signal transduction pathways as exemplified by analysis of bortezomib-treated human leukemia cells.
Genomic analyses in cancer have been enormously impactful, leading to the identification of driver mutations and development of targeted therapies. But the functions of the vast majority of somatic ...mutations and copy number variants in tumours remain unknown, and the causes of resistance to targeted therapies and methods to overcome them are poorly defined. Recent improvements in mass spectrometry-based proteomics now enable direct examination of the consequences of genomic aberrations, providing deep and quantitative characterization of tumour tissues. Integration of proteins and their post-translational modifications with genomic, epigenomic and transcriptomic data constitutes the new field of proteogenomics, and is already leading to new biological and diagnostic knowledge with the potential to improve our understanding of malignant transformation and therapeutic outcomes. In this Review we describe recent developments in proteogenomics and key findings from the proteogenomic analysis of a wide range of cancers. Considerations relevant to the selection and use of samples for proteogenomics and the current technologies used to generate, analyse and integrate proteomic with genomic data are described. Applications of proteogenomics in translational studies and immuno-oncology are rapidly emerging, and the prospect for their full integration into therapeutic trials and clinical care seems bright.
Better biomarkers are urgently needed to improve diagnosis, guide molecularly targeted therapy and monitor activity and therapeutic response across a wide spectrum of disease. Proteomics methods ...based on mass spectrometry hold special promise for the discovery of novel biomarkers that might form the foundation for new clinical blood tests, but to date their contribution to the diagnostic armamentarium has been disappointing. This is due in part to the lack of a coherent pipeline connecting marker discovery with well-established methods for validation. Advances in methods and technology now enable construction of a comprehensive biomarker pipeline from six essential process components: candidate discovery, qualification, verification, research assay optimization, biomarker validation and commercialization. Better understanding of the overall process of biomarker discovery and validation and of the challenges and strategies inherent in each phase should improve experimental study design, in turn increasing the efficiency of biomarker development and facilitating the delivery and deployment of novel clinical tests.
Here we present an optimized workflow for global proteome and phosphoproteome analysis of tissues or cell lines that uses isobaric tags (TMT (tandem mass tags)-10) for multiplexed analysis and ...relative quantification, and provides 3× higher throughput than iTRAQ (isobaric tags for absolute and relative quantification)-4-based methods with high intra- and inter-laboratory reproducibility. The workflow was systematically characterized and benchmarked across three independent laboratories using two distinct breast cancer subtypes from patient-derived xenograft models to enable assessment of proteome and phosphoproteome depth and quantitative reproducibility. Each plex consisted of ten samples, each being 300 μg of peptide derived from <50 mg of wet-weight tissue. Of the 10,000 proteins quantified per sample, we could distinguish 7,700 human proteins derived from tumor cells and 3100 mouse proteins derived from the surrounding stroma and blood. The maximum deviation across replicates and laboratories was <7%, and the inter-laboratory correlation for TMT ratio-based comparison of the two breast cancer subtypes was r > 0.88. The maximum deviation for the phosphoproteome coverage was <24% across laboratories, with an average of >37,000 quantified phosphosites per sample and differential quantification correlations of r > 0.72. The full procedure, including sample processing and data generation, can be completed within 10 d for ten tissue samples, and 100 samples can be analyzed in ~4 months using a single LC-MS/MS instrument. The high quality, depth, and reproducibility of the data obtained both within and across laboratories should enable new biological insights to be obtained from mass spectrometry-based proteomics analyses of cells and tissues together with proteogenomic data integration.
Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. Here we describe quantitative ...mass-spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKP1 to elevated expression of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.
Pathway analysis of PTM data sets is typically performed at a gene-centric level because of the lack of appropriately curated PTM signature databases. We have developed a PTM signatures database ...(PTMsigDB) providing curated phosphorylation signatures of kinases, perturbations and signaling pathways to enable site-specific PTM signature enrichment analysis (PTM-SEA). Application of PTM-SEA to phosphoproteomes of several cell lines perturbed with growth factors, cell cycle inhibitors, or a specific PI3K inhibitor demonstrated the potential of our site centric approach to study dysregulated pathways in cancers.
Display omitted
Highlights
•Database of PTM site-specific phosphorylation signatures of kinases, perturbations and signaling pathways (PTMsigDB).•PTM signature enrichment analysis (PTM-SEA) outperformed gene-centric analysis in detection of EGF induced phospho signaling events.•PI3K perturbation signatures were readily detected in PI3Ka inhibited human breast cancer cells.•PTMsigDB and PTM-SEA can be freely accessed at https://github.com/broadinstitute/ssGSEA2.0.
Signaling pathways are orchestrated by post-translational modifications (PTMs) such as phosphorylation. However, pathway analysis of PTM data sets generated by mass spectrometry (MS)-based proteomics is typically performed at a gene-centric level because of the lack of appropriately curated PTM signature databases and bioinformatic tools that leverage PTM site-specific information. Here we present the first version of PTMsigDB, a database of modification site-specific signatures of perturbations, kinase activities and signaling pathways curated from more than 2,500 publications. We adapted the widely used single sample Gene Set Enrichment Analysis approach to utilize PTMsigDB, enabling PTMSignature Enrichment Analysis (PTM-SEA) of quantitative MS data. We used a well-characterized data set of epidermal growth factor (EGF)-perturbed cancer cells to evaluate our approach and demonstrated better representation of signaling events compared with gene-centric methods. We then applied PTM-SEA to analyze the phosphoproteomes of cancer cells treated with cell-cycle inhibitors and detected mechanism-of-action specific signatures of cell cycle kinases. We also applied our methods to analyze the phosphoproteomes of PI3K-inhibited human breast cancer cells and detected signatures of compounds inhibiting PI3K as well as targets downstream of PI3K (AKT, MAPK/ERK) covering a substantial fraction of the PI3K pathway. PTMsigDB and PTM-SEA can be freely accessed at https://github.com/broadinstitute/ssGSEA2.0.
We have developed a novel plasma protein analysis platform with optimized sample preparation, chromatography, and MS analysis protocols. The workflow, which utilizes chemical isobaric mass tag ...labeling for relative quantification of plasma proteins, achieves far greater depth of proteome detection and quantification while simultaneously having increased sample throughput than prior methods. We applied the new workflow to a time series of plasma samples from patients undergoing a therapeutic, “planned” myocardial infarction for hypertrophic cardiomyopathy, a unique human model in which each person serves as their own biologic control. Over 5300 proteins were confidently identified in our experiments with an average of 4600 proteins identified per sample (with two or more distinct peptides identified per protein) using iTRAQ four-plex labeling. Nearly 3400 proteins were quantified in common across all 16 patient samples. Compared with a previously published label-free approach, the new method quantified almost fivefold more proteins/sample and provided a six- to nine-fold increase in sample analysis throughput. Moreover, this study provides the largest high-confidence plasma proteome dataset available to date. The reliability of relative quantification was also greatly improved relative to the label-free approach, with measured iTRAQ ratios and temporal trends correlating well with results from a 23-plex immunoMRM (iMRM) assay containing a subset of the candidate proteins applied to the same patient samples. The functional importance of improved detection and quantification was reflected in a markedly expanded list of significantly regulated proteins that provided many new candidate biomarker proteins. Preliminary evaluation of plasma sample labeling with TMT six-plex and ten-plex reagents suggests that even further increases in multiplexing of plasma analysis are practically achievable without significant losses in depth of detection relative to iTRAQ four-plex. These results obtained with our novel platform provide clear demonstration of the value of using isobaric mass tag reagents in plasma-based biomarker discovery experiments.
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a ...powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
Proteomic characterization of blood plasma is of central importance to clinical proteomics and particularly to biomarker discovery studies. The vast dynamic range and high complexity of the plasma ...proteome have, however, proven to be serious challenges and have often led to unacceptable tradeoffs between depth of coverage and sample throughput. We present an optimized sample-processing pipeline for analysis of the human plasma proteome that provides greatly increased depth of detection, improved quantitative precision and much higher sample analysis throughput as compared with prior methods. The process includes abundant protein depletion, isobaric labeling at the peptide level for multiplexed relative quantification and ultra-high-performance liquid chromatography coupled to accurate-mass, high-resolution tandem mass spectrometry analysis of peptides fractionated off-line by basic pH reversed-phase (bRP) chromatography. The overall reproducibility of the process, including immunoaffinity depletion, is high, with a process replicate coefficient of variation (CV) of <12%. Using isobaric tags for relative and absolute quantitation (iTRAQ) 4-plex, >4,500 proteins are detected and quantified per patient sample on average, with two or more peptides per protein and starting from as little as 200 μl of plasma. The approach can be multiplexed up to 10-plex using tandem mass tags (TMT) reagents, further increasing throughput, albeit with some decrease in the number of proteins quantified. In addition, we provide a rapid protocol for analysis of nonfractionated depleted plasma samples analyzed in 10-plex. This provides ∼600 quantified proteins for each of the ten samples in ∼5 h of instrument time.