We performed the first proteogenomic study on a prospectively collected colon cancer cohort. Comparative proteomic and phosphoproteomic analysis of paired tumor and normal adjacent tissues produced a ...catalog of colon cancer-associated proteins and phosphosites, including known and putative new biomarkers, drug targets, and cancer/testis antigens. Proteogenomic integration not only prioritized genomically inferred targets, such as copy-number drivers and mutation-derived neoantigens, but also yielded novel findings. Phosphoproteomics data associated Rb phosphorylation with increased proliferation and decreased apoptosis in colon cancer, which explains why this classical tumor suppressor is amplified in colon tumors and suggests a rationale for targeting Rb phosphorylation in colon cancer. Proteomics identified an association between decreased CD8 T cell infiltration and increased glycolysis in microsatellite instability-high (MSI-H) tumors, suggesting glycolysis as a potential target to overcome the resistance of MSI-H tumors to immune checkpoint blockade. Proteogenomics presents new avenues for biological discoveries and therapeutic development.
Display omitted
•Systematic identification of colon cancer-associated proteins and phosphosites•Proteomics-supported neoantigens and cancer/testis antigens in 78% of the tumors•Rb phosphorylation is an oncogenic driver and a putative target in colon cancer•Glycolysis inhibition may render MSI tumors more sensitive to checkpoint blockade
A systematic proteogenomic analysis of colon cancer reveals vulnerabilities of potential clinical value inaccessible from genomic assessment alone.
The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute’s Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated ...effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample’s proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein’s sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.
The Rembrandt brain cancer dataset includes 671 patients collected from 14 contributing institutions from 2004-2006. It is accessible for conducting clinical translational research using the open ...access Georgetown Database of Cancer (G-DOC) platform. In addition, the raw and processed genomics and transcriptomics data have also been made available via the public NCBI GEO repository as a super series GSE108476. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of gene expression and copy number changes in patients alongside clinical outcomes (overall survival) using this large brain cancer study.
Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer ...variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.
Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set and the ability to integrate biomedical data from ...disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed, and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient's tumor. Here, we present Repository of Molecular Brain Neoplasia Data (Rembrandt), a cancer clinical genomics database and a Web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising approximately 566 gene expression arrays, 834 copy number arrays, and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that Rembrandt represents a prototype of how high-throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic.
Stem cell antigen-1 (Sca-1) is used to isolate and characterize tumor initiating cell populations from tumors of various murine models 1. Sca-1 induced disruption of TGF-β signaling is required in ...vivo tumorigenesis in breast cancer models 2, 3-5. The role of human Ly6 gene family is only beginning to be appreciated in recent literature 6-9. To study the significance of Ly6 gene family members, we have visualized one hundred thirty gene expression omnibus (GEO) dataset using Oncomine (Invitrogen) and Georgetown Database of Cancer (G-DOC). This analysis showed that four different members Ly6D, Ly6E, Ly6H or Ly6K have increased gene expressed in bladder, brain and CNS, breast, colorectal, cervical, ovarian, lung, head and neck, pancreatic and prostate cancer than their normal counter part tissues. Increased expression of Ly6D, Ly6E, Ly6H or Ly6K was observed in sub-set of cancer type. The increased expression of Ly6D, Ly6E, Ly6H and Ly6K was found to be associated with poor outcome in ovarian, colorectal, gastric, breast, lung, bladder or brain and CNS as observed by KM plotter and PROGgeneV2 platform. The remarkable findings of increased expression of Ly6 family members and its positive correlation with poor outcome on patient survival in multiple cancer type indicate that Ly6 family members Ly6D, Ly6E, Ly6K and Ly6H will be an important targets in clinical practice as marker of poor prognosis and for developing novel therapeutics in multiple cancer type.
The coronavirus disease 2019 (COVID-19) pandemic caused by the SARS-CoV-2 virus has affected over 700 million people, and caused over 7 million deaths throughout the world as of April 2024, and ...continues to affect people through seasonal waves. While over 675 million people have recovered from this disease globally, the lingering effects of the disease are still under study. Long term effects of SARS-CoV-2 infection, known as 'long COVID,' include a wide range of symptoms including fatigue, chest pain, cellular damage, along with a strong innate immune response characterized by inflammatory cytokine production. Three years after the pandemic, data about long covid studies are finally emerging. More clinical studies and clinical trials are needed to understand and determine the factors that predispose individuals to these long-term side effects.
In this methodology paper, our goal was to apply data driven approaches in order to explore the multidimensional landscape of infected lung tissue microenvironment to better understand complex interactions between viral infection, immune response and the lung microbiome of patients with (a) SARS-CoV-2 virus and (b) NL63 coronavirus. The samples were analyzed with several machine learning tools allowing simultaneous detection and quantification of viral RNA amount at genome and gene level; human gene expression and fractions of major types of immune cells, as well as metagenomic analysis of bacterial and viral abundance. To contrast and compare specific viral response to SARS-COV-2, we analyzed deep sequencing data from additional cohort of patients infected with NL63 strain of corona virus.
Our correlation analysis of three types of RNA-seq based measurements in patients i.e. fraction of viral RNA (at genome and gene level), Human RNA (transcripts and gene level) and bacterial RNA (metagenomic analysis), showed significant correlation between viral load as well as level of specific viral gene expression with the fractions of immune cells present in lung lavage as well as with abundance of major fractions of lung microbiome in COVID-19 patients.
Our methodology-based proof-of-concept study has provided novel insights into complex regulatory signaling interactions and correlative patterns between the viral infection, inhibition of innate and adaptive immune response as well as microbiome landscape of the lung tissue. These initial findings could provide better understanding of the diverse dynamics of immune response and the side effects of the SARS-CoV-2 infection and demonstrates the possibilities of the various types of analyses that could be performed from this type of data.
Many gene products exhibit great structural heterogeneity because of an array of modifications. These modifications are not directly encoded in the genomic template but often affect the functionality ...of proteins. Protein glycosylation plays a vital role in proper protein functions. However, the analysis of glycoproteins has been challenging compared with other protein modifications, such as phosphorylation. Here, we perform an integrated proteomic and glycoproteomic analysis of 83 prospectively collected high-grade serous ovarian carcinoma (HGSC) and 23 non-tumor tissues. Integration of the expression data from global proteomics and glycoproteomics reveals tumor-specific glycosylation, uncovers different glycosylation associated with three tumor clusters, and identifies glycosylation enzymes that were correlated with the altered glycosylation. In addition to providing a valuable resource, these results provide insights into the potential roles of glycosylation in the pathogenesis of HGSC, with the possibility of distinguishing pathological outcomes of ovarian tumors from non-tumors, as well as classifying tumor clusters.
Display omitted
•Proteomics and glycoproteomics of 83 ovarian cancer and 23 relevant non-tumor tissues•Glycosylation is associated with three tumor clusters•Tumor-specific changes of glycoproteins and glycosites are apparent•Enzymes responsible for the glycosylation alterations are identified
Hu et al. provide an integrated proteomic and glycoproteomic characterization of high-grade serous ovarian carcinomas and relevant non-tumor tissues, which reveals tumor-specific glycosylation, uncovers different glycosylation associated with three tumor clusters, and identifies glycosylation enzymes correlated with glycosylation alterations.
An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could ...be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.