WebGestalt is a popular tool for the interpretation of gene lists derived from large scale -omics studies. In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 ...functional categories, as well as user-uploaded functional databases. To address the growing and unique need for phosphoproteomics data interpretation, we have implemented phosphosite set analysis to identify important kinases from phosphoproteomics data. We have completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures. To facilitate comprehension of the enrichment results, we have implemented two methods to reduce redundancy between enriched gene sets. We introduced a web API for other applications to get data programmatically from the WebGestalt server or pass data to WebGestalt for analysis. We also wrapped the core computation into an R package called WebGestaltR for users to perform analysis locally or in third party workflows. WebGestalt can be freely accessed at http://www.webgestalt.org.
RNA-Seq and mass spectrometry-based studies generate omics data tables with measurements for tens of thousands of genes across all samples in a study. The success of a study relies on the quality of ...these data tables, which is determined by both experimental data generation and computational methods used to process raw experimental data into quantitative data tables. We present OmicsEV, an R package for the quality evaluation of omics data tables. For each data table, OmicsEV uses a series of methods to evaluate data depth, data normalization, batch effect, biological signal, platform reproducibility and multi-omics concordance, producing comprehensive visual and quantitative evaluation results that help assess the data quality of individual data tables and facilitate the identification of the optimal data processing method and parameters for the omics study under investigation.
The source code and the user manual of OmicsEV are available at https://github.com/bzhanglab/OmicsEV, and the source code is released under the GPL-3 license.
Although cellular behaviors are dynamic, the networks that govern these behaviors have been mapped primarily as static snapshots. Using an approach called differential epistasis mapping, we have ...discovered widespread changes in genetic interaction among yeast kinases, phosphatases, and transcription factors as the cell responds to DNA damage. Differential interactions uncover many gene functions that go undetected in static conditions. They are very effective at identifying DNA repair pathways, highlighting new damage-dependent roles for the Slt2 kinase, Pph3 phosphatase, and histone variant Htz1. The data also reveal that protein complexes are generally stable in response to perturbation, but the functional relations between these complexes are substantially reorganized. Differential networks chart a new type of genetic landscape that is invaluable for mapping cellular responses to stimuli.
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples. One of the primary challenges associated with this technology is the relatively low rate of ...phosphopeptide identification during data analysis. This limitation hampers the full realization of the potential offered by shotgun phosphoproteomics. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19% to 46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Display omitted
•DeepRescore2 leverages deep learning to improve phosphopeptide identification.•Demonstrated sensitivity and accuracy in synthetic and biological datasets.•Increased identification of prognostic phosphosites in liver cancer.•EGFR hyperactivation as a new target in poor-prognosis liver cancer.
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples, but low phosphopeptide identification rates in data analysis limit its potential. We introduce DeepRescore2, a computational workflow using deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. We benchmark DeepRescore2 against existing workflows on a synthetic phosphopeptide dataset and apply it to real-world biological datasets, revealing improved sensitivity, fewer missing values, and enhanced phosphoproteomics-based biological discoveries.
Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or ...precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning–based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.
Display omitted
•IDPpub utilizes BioBERT to extract phosphorylation sites.•Alignment of extracted phosphorylation sites to reference sequences.•Identified evidence sentences for tumor-associated sites.•Web portal to explore evidence sentences for phosphorylation sites.
Phosphorylation sites have important functions but are poorly annotated in public databases. We developed IDPpub, a pipeline that uses BioBERT to extract phosphorylation sites from biomedical abstracts. We align the sites to human and mouse reference sequences to facilitate computational applications and intersection with mass spectrometry experiments. The evidence sentences can be used to identify regulating enzymes and biological functions. All data are available in a web portal for easy search.
DNA damage activates checkpoint kinases that induce several downstream events, including widespread changes in transcription. However, the specific connections between the checkpoint kinases and ...downstream transcription factors (TFs) are not well understood. Here, we integrate kinase mutant expression profiles, transcriptional regulatory interactions, and phosphoproteomics to map kinases and downstream TFs to transcriptional regulatory networks. Specifically, we investigate the role of the Saccharomyces cerevisiae checkpoint kinases (Mec1, Tel1, Chk1, Rad53, and Dun1) in the transcriptional response to DNA damage caused by methyl methanesulfonate. The result is a global kinase-TF regulatory network in which Mec1 and Tel1 signal through Rad53 to synergistically regulate the expression of more than 600 genes. This network involves at least nine TFs, many of which have Rad53-dependent phosphorylation sites, as regulators of checkpoint-kinase-dependent genes. We also identify a major DNA damage-induced transcriptional network that regulates stress response genes independently of the checkpoint kinases.
Display omitted
•Rad53 regulates a transcriptional response to DNA damage involving more than 600 genes•Both Mec1 and Tel1 are required for activation of the complete Rad53-dependent response•The Rad53-dependent network involves targets of nine TFs•The checkpoint-kinase-independent response involves seven distinct TFs
DNA damage checkpoint kinases activate different cellular pathways, including widespread changes in gene expression. Here, Kolodner and colleagues integrate kinase mutant expression profiles, transcription factor regulatory networks, and phosphoproteomics to investigate the role of yeast checkpoint kinases in the transcriptional response to DNA damage induced by methyl methanesulfonate. The result is a global kinase-transcription factor regulatory network in which the Rad53/Chk2 kinase regulates at least nine downstream transcription factors and a kinase-independent network involving at least eight different transcription factors.
Significance The function of brown adipose tissue (BAT), which converts chemical energy into heat, has been widely characterized, but how BAT forms and what signaling molecules regulate its formation ...are largely unknown. In this paper, we report that Hedgehog (Hh) signaling inhibits the formation of BAT during development. Activation of Hh signaling, specifically in the BAT of mice during development, resulted in the loss of interscapular BAT due to the impairment of brown-preadipocyte differentiation. Remarkably, the majority of the BAT cells in the neck were replaced by skeletal muscle-like cells in embryos with elevated Hh activity. These findings indicate that Hh is an essential regulator of BAT development and that developing BAT depots respond differentially to Hh signaling.
Although recent studies have shown that brown adipose tissue (BAT) arises from progenitor cells that also give rise to skeletal muscle, the developmental signals that control the formation of BAT remain largely unknown. Here, we show that brown preadipocytes possess primary cilia and can respond to Hedgehog (Hh) signaling. Furthermore, cell-autonomous activation of Hh signaling blocks early brown-preadipocyte differentiation, inhibits BAT formation in vivo, and results in replacement of neck BAT with poorly differentiated skeletal muscle. Finally, we show that Hh signaling inhibits BAT formation partially through up-regulation of chicken ovalbumin upstream promoter transcription factor II ( COUP-TFII ). Taken together, our studies uncover a previously unidentified role for Hh as an inhibitor of BAT development.
Fewer than 200 proteins are targeted by cancer drugs approved by the Food and Drug Administration (FDA). We integrate Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteogenomics data from ...1,043 patients across 10 cancer types with additional public datasets to identify potential therapeutic targets. Pan-cancer analysis of 2,863 druggable proteins reveals a wide abundance range and identifies biological factors that affect mRNA-protein correlation. Integration of proteomic data from tumors and genetic screen data from cell lines identifies protein overexpression- or hyperactivation-driven druggable dependencies, enabling accurate predictions of effective drug targets. Proteogenomic identification of synthetic lethality provides a strategy to target tumor suppressor gene loss. Combining proteogenomic analysis and MHC binding prediction prioritizes mutant KRAS peptides as promising public neoantigens. Computational identification of shared tumor-associated antigens followed by experimental confirmation nominates peptides as immunotherapy targets. These analyses, summarized at https://targets.linkedomics.org, form a comprehensive landscape of protein and peptide targets for companion diagnostics, drug repurposing, and therapy development.
Display omitted
•Integrating tumor proteogenomics with cell line data reveals pan-cancer druggable targets•Proteogenomic discovery of synthetic lethality facilitates targeting tumor suppressor loss•Computational workflows enable effective tumor antigen identification•Web portal provides access to identified targets and their supporting data
Integrating pan-cancer proteogenomic data from 1,043 patients across 10 cancer types, genetic screen data from cell lines, and tumor antigen predictions unveils a comprehensive landscape of protein and peptide targets for drug repurposing and therapy development.
Microscaled proteogenomics was deployed to probe the molecular basis for differential response to neoadjuvant carboplatin and docetaxel combination chemotherapy for triple-negative breast cancer ...(TNBC). Proteomic analyses of pretreatment patient biopsies uniquely revealed metabolic pathways, including oxidative phosphorylation, adipogenesis, and fatty acid metabolism, that were associated with resistance. Both proteomics and transcriptomics revealed that sensitivity was marked by elevation of DNA repair, E2F targets, G2-M checkpoint, interferon-gamma signaling, and immune-checkpoint components. Proteogenomic analyses of somatic copy-number aberrations identified a resistance-associated 19q13.31-33 deletion where LIG1, POLD1, and XRCC1 are located. In orthogonal datasets, LIG1 (DNA ligase I) gene deletion and/or low mRNA expression levels were associated with lack of pathologic complete response, higher chromosomal instability index (CIN), and poor prognosis in TNBC, as well as carboplatin-selective resistance in TNBC preclinical models. Hemizygous loss of LIG1 was also associated with higher CIN and poor prognosis in other cancer types, demonstrating broader clinical implications.
Proteogenomic analysis of triple-negative breast tumors revealed a complex landscape of chemotherapy response associations, including a 19q13.31-33 somatic deletion encoding genes serving lagging-strand DNA synthesis (LIG1, POLD1, and XRCC1), that correlate with lack of pathologic response, carboplatin-selective resistance, and, in pan-cancer studies, poor prognosis and CIN. This article is highlighted in the In This Issue feature, p. 2483.
The DNA damage and replication checkpoints are believed to primarily slow the progression of the cell cycle to allow DNA repair to occur. Here we summarize known aspects of the
Saccharomyces ...cerevisiae checkpoints including how these responses are integrated into downstream effects on the cell cycle, chromatin, DNA repair, and cytoplasmic targets. Analysis of the transcriptional response demonstrates that it is far more complex and less relevant to the repair of DNA damage than the bacterial SOS response. We also address more speculative questions regarding potential roles of the checkpoint during the normal S-phase and how current evidence hints at a checkpoint activation mechanism mediated by positive feedback that amplifies initial damage signals above a minimum threshold.