Clear cell renal cell carcinoma (ccRCC) is the most common type of kidney cancer, comprising approximately 75% of all kidney tumors. Recent the Cancer Genome Atlas (TCGA) and International Cancer ...Genome Consortium (ICGC) studies have significantly advanced the molecular characterization of RCC and facilitated the development of targeted therapies. Such advances have improved the median survival of patients with advanced disease from less than 10 months prior to 2004 to 30 months by 2011. However, approximately 30% of localized ccRCC patients will nevertheless develop recurrence or metastasis after surgical resection of their tumor. Therefore, it is critical to further analyze potential tumor-associated proteins and their profiles during disease progression. Over the past decade, tremendous effort has been focused on the study of molecular pathways, including genomics, transcriptomics, and proteomics in order to identify potential molecular biomarkers, as well as to facilitate early detection, monitor tumor progression and uncover potentially therapeutic targets. In this review, we focus on recent advances in the proteomic analysis of ccRCC, current strategies and challenges, and perspectives in the field. This insight will highlight the discovery of tumor-associated proteins, and their potential clinical impact on personalized precision-based care in ccRCC.
Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. Here we describe quantitative ...mass-spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKP1 to elevated expression of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. ...Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (∼80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis.
The complexity of proteomic instrumentation for LC-MS/MS introduces many possible sources of variability. Data-dependent sampling of peptides constitutes a stochastic element at the heart of ...discovery proteomics. Although this variation impacts the identification of peptides, proteomic identifications are far from completely random. In this study, we analyzed interlaboratory data sets from the NCI Clinical Proteomic Technology Assessment for Cancer to examine repeatability and reproducibility in peptide and protein identifications. Included data spanned 144 LC-MS/MS experiments on four Thermo LTQ and four Orbitrap instruments. Samples included yeast lysate, the NCI-20 defined dynamic range protein mix, and the Sigma UPS 1 defined equimolar protein mix. Some of our findings reinforced conventional wisdom, such as repeatability and reproducibility being higher for proteins than for peptides. Most lessons from the data, however, were more subtle. Orbitraps proved capable of higher repeatability and reproducibility, but aberrant performance occasionally erased these gains. Even the simplest protein digestions yielded more peptide ions than LC-MS/MS could identify during a single experiment. We observed that peptide lists from pairs of technical replicates overlapped by 35−60%, giving a range for peptide-level repeatability in these experiments. Sample complexity did not appear to affect peptide identification repeatability, even as numbers of identified spectra changed by an order of magnitude. Statistical analysis of protein spectral counts revealed greater stability across technical replicates for Orbitraps, making them superior to LTQ instruments for biomarker candidate discovery. The most repeatable peptides were those corresponding to conventional tryptic cleavage sites, those that produced intense MS signals, and those that resulted from proteins generating many distinct peptides. Reproducibility among different instruments of the same type lagged behind repeatability of technical replicates on a single instrument by several percent. These findings reinforce the importance of evaluating repeatability as a fundamental characteristic of analytical technologies.
Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent ...advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.
Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be ...overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.
Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available ...performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments.
In the absence of a dominant driving mutation other than uniformly present TP53 mutations, deeper understanding of the biology driving ovarian high-grade serous cancer (HGSC) requires analysis at a ...functional level, including post-translational modifications. Comprehensive proteogenomic and phosphoproteomic characterization of 83 prospectively collected ovarian HGSC and appropriate normal precursor tissue samples (fallopian tube) under strict control of ischemia time reveals pathways that significantly differentiate between HGSC and relevant normal tissues in the context of homologous repair deficiency (HRD) status. In addition to confirming key features of HGSC from previous studies, including a potential survival-associated signature and histone acetylation as a marker of HRD, deep phosphoproteomics provides insights regarding the potential role of proliferation-induced replication stress in promoting the characteristic chromosomal instability of HGSC and suggests potential therapeutic targets for use in precision medicine trials.
Display omitted
Comparison of ovarian cancer and normal precursors identifies key signaling pathwaysMitotic and cyclin-dependent kinases emerge as potential therapeutic targetsPreviously identified hallmarks of homologous repair status and survival are confirmedReplication stress appears to drive increased chromosomal instability
McDermott et al. present the proteogenomic analysis of prospectively collected ovarian high-grade serous cancer samples and appropriate normal precursor samples under tight ischemic control. They identify tumor-associated signaling pathways and mitotic and cyclin-dependent kinases as key oncogenic drivers potentially related to chromosomal instability.
The Human Cancer Proteome Project (Cancer-HPP) is an international initiative organized by HUPO whose key objective is to decipher the human cancer proteome through a coordinated effort by cancer ...proteome researchers around the world. The ultimate goal is to map the entire human cancer proteome to disclose tumor biology and drive improved diagnostics, treatment and management of cancer. Here we report the progress in the cancer proteomics field to date, and discuss future proteomic developments that will be needed to optimally delineate cancer phenotypes and advance the molecular characterization of this significant disease that is one of the leading causes of death worldwide.
In the past two decades, our ability to study cellular and molecular systems has been transformed through the development of omics sciences. While unlimited potential lies within massive omics ...datasets, the success of omics sciences to further our understanding of human disease and/or translating these findings to clinical utility remains elusive due to a number of factors. A significant limiting factor is the integration of different omics datasets (i.e., integromics) for extraction of biological and clinical insights. To this end, the National Cancer Institute (NCI) and the National Heart, Lung and Blood Institute (NHLBI) organized a joint workshop in June 2012 with the focus on integration issues related to multi-omics technologies that needed to be resolved in order to realize the full utility of integrating omics datasets by providing a glimpse into the disease as an integrated "system". The overarching goals were to (1) identify challenges and roadblocks in omics integration, and (2) facilitate the full maturation of 'integromics' in biology and medicine. Participants reached a consensus on the most significant barriers for integrating omics sciences and provided recommendations on viable approaches to overcome each of these barriers within the areas of technology, bioinformatics and clinical medicine. Keywords: Omics integration, Omics science, Clinical application, Risk prediction, Proteomics, Metabolomics, Genomics