Deep Learning in Proteomics Wen, Bo; Zeng, Wen‐Feng; Liao, Yuxing ...
Proteomics (Weinheim),
November 2020, Letnik:
20, Številka:
21-22
Journal Article
Recenzirano
Odprti dostop
Proteomics, the study of all the proteins in biological systems, is becoming a data‐rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent ...advancements in tandem mass spectrometry (MS) technology, protein expression and post‐translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of ion from data, and it thrives in data‐rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex‐peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Mass spectrometry-based phosphoproteomics is becoming an essential methodology for the study of global cellular signaling. Numerous bioinformatics resources are available to facilitate the ...translation of phosphopeptide identification and quantification results into novel biological and clinical insights, a critical step in phosphoproteomics data analysis. These resources include knowledge bases of kinases and phosphatases, phosphorylation sites, kinase inhibitors, and sequence variants affecting kinase function, and bioinformatics tools that can predict phosphorylation sites in addition to the kinase that phosphorylates them, infer kinase activity, and predict the effect of mutations on kinase signaling. However, these resources exist in silos and it is challenging to select among multiple resources with similar functions. Therefore, we put together a comprehensive collection of resources related to phosphoproteomics data interpretation, compared the use of tools with similar functions, and assessed the usability from the standpoint of typical biologists or clinicians. Overall, tools could be improved by standardization of enzyme names, flexibility of data input and output format, consistent maintenance, and detailed manuals.
Platinum-based chemotherapy, including cisplatin, carboplatin, and oxaliplatin, is prescribed to 10-20% of all cancer patients. Unfortunately, platinum resistance develops in a significant number of ...patients and is a determinant of clinical outcome. Extensive research has been conducted to understand and overcome platinum resistance, and mechanisms of resistance can be categorized into several broad biological processes, including (1) regulation of drug entry, exit, accumulation, sequestration, and detoxification, (2) enhanced repair and tolerance of platinum-induced DNA damage, (3) alterations in cell survival pathways, (4) alterations in pleiotropic processes and pathways, and (5) changes in the tumor microenvironment. As a resource to the cancer research community, we provide a comprehensive overview accompanied by a manually curated database of the >900 genes/proteins that have been associated with platinum resistance over the last 30 years of literature. The database is annotated with possible pathways through which the curated genes are related to platinum resistance, types of evidence, and hyperlinks to literature sources. The searchable, downloadable database is available online at http://ptrc-ddr.cptac-data-view.org .
Weighted set cover and affinity propagation algorithms are used to combine results from multiple enrichment analyses. Weighted set cover first condenses enriched gene sets to use the fewest number of ...gene sets that cover all relevant genes. Affinity propagation then clusters the enriched pathways and selects the most representative set. Together they facilitate interpretation of multiple enrichment analysis results. A demonstration of its utility highlights both general and unique pathways associated with cancer survival across seven cancer types.
Display omitted
Highlights
•Weighted set cover significantly condenses gene sets after enrichment analysis.•Affinity propagation clusters gene sets from multiple enrichment analyses.•Clustering pathways using selected genes is more biologically relevant.•Pathways associated with poor or good survival from seven cancer types.
Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52–77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNA-Seq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github.com/bzhanglab/sumer.
Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or ...precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning–based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.
Display omitted
•IDPpub utilizes BioBERT to extract phosphorylation sites.•Alignment of extracted phosphorylation sites to reference sequences.•Identified evidence sentences for tumor-associated sites.•Web portal to explore evidence sentences for phosphorylation sites.
Phosphorylation sites have important functions but are poorly annotated in public databases. We developed IDPpub, a pipeline that uses BioBERT to extract phosphorylation sites from biomedical abstracts. We align the sites to human and mouse reference sequences to facilitate computational applications and intersection with mass spectrometry experiments. The evidence sentences can be used to identify regulating enzymes and biological functions. All data are available in a web portal for easy search.
Ferroptosis is a caspase-independent, iron-dependent form of regulated necrosis extant in traumatic brain injury, Huntington disease, and hemorrhagic stroke. It can be activated by cystine ...deprivation leading to glutathione depletion, the insufficiency of the antioxidant glutathione peroxidase-4, and the hemolysis products hemoglobin and hemin. A cardinal feature of ferroptosis is extracellular signal-regulated kinase (ERK)1/2 activation culminating in its translocation to the nucleus. We have previously confirmed that the mitogen-activated protein (MAP) kinase kinase (MEK) inhibitor U0126 inhibits persistent ERK1/2 phosphorylation and ferroptosis. Here, we show that hemin exposure, a model of secondary injury in brain hemorrhage and ferroptosis, activated ERK1/2 in mouse neurons. Accordingly, MEK inhibitor U0126 protected against hemin-induced ferroptosis. Unexpectedly, U0126 prevented hemin-induced ferroptosis independent of its ability to inhibit ERK1/2 signaling. In contrast to classical ferroptosis in neurons or cancer cells, chemically diverse inhibitors of MEK did not block hemin-induced ferroptosis, nor did the forced expression of the ERK-selective MAP kinase phosphatase (MKP)3. We conclude that hemin or hemoglobin-induced ferroptosis, unlike glutathione depletion, is ERK1/2-independent. Together with recent studies, our findings suggest the existence of a novel subtype of neuronal ferroptosis relevant to bleeding in the brain that is 5-lipoxygenase-dependent, ERK-independent, and transcription-independent. Remarkably, our unbiased phosphoproteome analysis revealed dramatic differences in phosphorylation induced by two ferroptosis subtypes. As U0126 also reduced cell death and improved functional recovery after hemorrhagic stroke in male mice, our analysis also provides a template on which to build a search for U0126's effects in a variant of neuronal ferroptosis.
Ferroptosis is an iron-dependent mechanism of regulated necrosis that has been linked to hemorrhagic stroke. Common features of ferroptotic death induced by diverse stimuli are the depletion of the antioxidant glutathione, production of lipoxygenase-dependent reactive lipids, sensitivity to iron chelation, and persistent activation of extracellular signal-regulated kinase (ERK) signaling. Unlike classical ferroptosis induced in neurons or cancer cells, here we show that ferroptosis induced by hemin is ERK-independent. Paradoxically, the canonical MAP kinase kinase (MEK) inhibitor U0126 blocks brain hemorrhage-induced death. Altogether, these data suggest that a variant of ferroptosis is unleashed in hemorrhagic stroke. We present the first, unbiased phosphoproteomic analysis of ferroptosis as a template on which to understand distinct paths to cell death that meet the definition of ferroptosis.
Comprehensive characterization of tumor antigens is essential for the design of cancer immunotherapies, and mass spectrometry (MS)-based immunopeptidomics enables high-throughput identification of ...major histocompatibility complex (MHC)-bound peptide antigens in vivo. Here we construct an immunopeptidome atlas of human cancer through an extensive collection of 43 published immunopeptidomic datasets and standardized analysis of 81.6 million MS/MS spectra using an open search engine. Our analysis greatly expands the current knowledge of MHC-bound antigens, including an unprecedented characterization of post-translationally modified antigens and their cancer-association. We also perform systematic analysis of cancer-testis antigens, cancer-associated antigens, and neoantigens. We make all these data together with annotated MS/MS spectra supporting identification of each antigen in an easily browsable web portal named cancer antigen atlas (caAtlas). caAtlas provides a central resource for the selection and prioritization of MHC-bound peptides for in vitro HLA binding assay and immunogenicity testing, which will pave the way to eventual development of cancer immunotherapies.
Display omitted
•Extensive collection of 43 immunopeptidomic datasets with 1018 samples•Standardized and rigorous identification of HLA-bound peptides, including PTM peptides•Comprehensive annotation of CT antigens and cancer-associated antigens•User-friendly data dissemination through the caAtlas web portal
Immunology; Proteomics; Cancer
TNFα has been identified as playing an important role in pathologic complications associated with diabetic retinopathy and retinal inflammation, such as retinal leukostasis. However, the ...transcriptional effects of TNFα on retinal microvascular endothelial cells and the different signaling pathways involved are not yet fully understood. In the present study, RNA-seq was used to profile the transcriptome of human retinal microvascular endothelial cells (HRMEC) treated for 4 hours with TNFα in the presence or absence of the NFAT-specific inhibitor INCA-6, in order to gain insight into the specific effects of TNFα on RMEC and identify any involvement of NFAT signaling. Differential expression analysis revealed that TNFα treatment significantly upregulated the expression of 579 genes when compared to vehicle-treated controls, and subsequent pathway analysis revealed a TNFα-induced enrichment of transcripts associated with cytokine-cytokine receptor interactions, cell adhesion molecules, and leukocyte transendothelial migration. Differential expression analysis comparing TNFα-treated cells to those co-treated with INCA-6 revealed 10 genes whose expression was significantly reduced by the NFAT inhibitor, including those encoding the proteins VCAM1 and CX3CL1 and cytokines CXCL10 and CXCL11. This study identifies the transcriptional effects of TNFα on HRMEC, highlighting its involvement in multiple pathways that contribute to retinal leukostasis, and identifying a previously unknown role for NFAT-signaling downstream of TNFα.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK