Data independent acquisition (DIA) mass spectrometry is a powerful technique that is improving the reproducibility and throughput of proteomics studies. Here, we introduce an experimental workflow ...that uses this technique to construct chromatogram libraries that capture fragment ion chromatographic peak shape and retention time for every detectable peptide in a proteomics experiment. These coordinates calibrate protein databases or spectrum libraries to a specific mass spectrometer and chromatography setup, facilitating DIA-only pipelines and the reuse of global resource libraries. We also present EncyclopeDIA, a software tool for generating and searching chromatogram libraries, and demonstrate the performance of our workflow by quantifying proteins in human and yeast cells. We find that by exploiting calibrated retention time and fragmentation specificity in chromatogram libraries, EncyclopeDIA can detect 20-25% more peptides from DIA experiments than with data dependent acquisition-based spectrum libraries alone.
MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental ...workflows and data-dependent, targeted and data-independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.
The code, the documentation and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org and used in an R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2014) and used via graphical user interface.
Skyline is a freely available, open‐source Windows client application for accelerating targeted proteomics experimentation, with an emphasis on the proteomics and mass spectrometry community as users ...and as contributors. This review covers the informatics encompassed by the Skyline ecosystem, from computationally assisted targeted mass spectrometry method development, to raw acquisition file data processing, and quantitative analysis and results sharing.
A major goal of proteomics research is the accurate and sensitive identification and quantification of a broad range of proteins within a sample. Data-independent acquisition (DIA) approaches that ...acquire MS/MS spectra independently of precursor information have been developed to overcome the reproducibility challenges of data-dependent acquisition and the limited breadth of targeted proteomics strategies. Typical DIA implementations use wide MS/MS isolation windows to acquire comprehensive fragment ion data. However, wide isolation windows produce highly chimeric spectra, limiting the achievable sensitivity and accuracy of quantification and identification. Here, we present a DIA strategy in which spectra are collected with overlapping (rather than adjacent or random) windows and then computationally demultiplexed. This approach improves precursor selectivity by nearly a factor of 2, without incurring any loss in mass range, mass resolution, chromatographic resolution, scan speed, or other key acquisition parameters. We demonstrate a 64% improvement in sensitivity and a 17% improvement in peptides detected in a 6-protein bovine mix spiked into a yeast background. To confirm the method’s applicability to a realistic biological experiment, we also analyze the regulation of the proteasome in yeast grown in rapamycin and show that DIA experiments with overlapping windows can help elucidate its adaptation toward the degradation of oxidatively damaged proteins. Our integrated computational and experimental DIA strategy is compatible with any DIA-capable instrument. The computational demultiplexing algorithm required to analyze the data has been made available as part of the open-source proteomics software tools Skyline and msconvert (Proteowizard), making it easy to apply as part of standard proteomics workflows.
Graphical Abstract
Vendor-independent software tools for quantification of small molecules and metabolites are lacking, especially for targeted analysis workflows. Skyline is a freely available, open-source software ...tool for targeted quantitative mass spectrometry method development and data processing with a 10 year history supporting six major instrument vendors. Designed initially for proteomics analysis, we describe the expansion of Skyline to data for small molecule analysis, including selected reaction monitoring, high-resolution mass spectrometry, and calibrated quantification. This fundamental expansion of Skyline from a peptide-sequence-centric tool to a molecule-centric tool makes it agnostic to the source of the molecule while retaining Skyline features critical for workflows in both peptide and more general biomolecular research. The data visualization and interrogation features already available in Skyline, such as peak picking, chromatographic alignment, and transition selection, have been adapted to support small molecule data, including metabolomics. Herein, we explain the conceptual workflow for small molecule analysis using Skyline, demonstrate Skyline performance benchmarked against a comparable instrument vendor software tool, and present additional real-world applications. Further, we include step-by-step instructions on using Skyline for small molecule quantitative method development and data analysis on data acquired with a variety of mass spectrometers from multiple instrument vendors.
Abstract
The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide ...since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including ‘big data’ approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.
Multiple reaction monitoring (MRM) has recently become the method of choice for targeted quantitative measurement of proteins using mass spectrometry. The method, however, is limited in the number of ...peptides that can be measured in one run. This number can be markedly increased by scheduling the acquisition if the accurate retention time (RT) of each peptide is known. Here we present iRT, an empirically derived dimensionless peptide‐specific value that allows for highly accurate RT prediction. The iRT of a peptide is a fixed number relative to a standard set of reference iRT‐peptides that can be transferred across laboratories and chromatographic systems. We show that iRT facilitates the setup of multiplexed experiments with acquisition windows more than four times smaller compared to in silico RT predictions resulting in improved quantification accuracy. iRTs can be determined by any laboratory and shared transparently. The iRT concept has been implemented in Skyline, the most widely used software for MRM experiments.
To address the growing need for a centralized, community resource of published results processed with Skyline, and to provide reviewers and readers immediate visual access to the data behind ...published conclusions, we present Panorama Public (https://panoramaweb.org/public.url), a repository of Skyline documents supporting published results. Panorama Public is built on Panorama, an open source data management system for mass spectrometry data processed with the Skyline targeted mass spectrometry environment. The Panorama web application facilitates viewing, sharing, and disseminating results contained in Skyline documents via a web-browser. Skyline users can easily upload their documents to a Panorama server and allow other researchers to explore uploaded results in the Panorama web-interface through a variety of familiar summary graphs as well as annotated views of the chromatographic peaks processed with Skyline. This makes Panorama ideal for sharing targeted, quantitative results contained in Skyline documents with collaborators, reviewers, and the larger proteomics community. The Panorama Public repository employs the full data visualization capabilities of Panorama which facilitates sharing results with reviewers during manuscript review.
Porous graphitized carbon (PGC) based chromatography achieves high-resolution separation of glycan structures released from glycoproteins. This approach is especially valuable when resolving ...structurally similar isomers and for discovery of novel and/or sample-specific glycan structures. However, the implementation of PGC-based separations in glycomics studies has been limited because system-independent retention values have not been established to normalize technical variation. To address this limitation, this study combined the use of hydrolyzed dextran as an internal standard and Skyline software for post-acquisition normalization to reduce retention time and peak area technical variation in PGC-based glycan analyses. This approach allowed assignment of system-independent retention values that are applicable to typical PGC-based glycan separations and supported the construction of a library containing >300 PGC-separated glycan structures with normalized glucose unit (GU) retention values. To enable the automation of this normalization method, a spectral MS/MS library was developed of the dextran ladder, achieving confident discrimination against isomeric glycans. The utility of this approach is demonstrated in two ways. First, to inform the search space for bioinformatically predicted but unobserved glycan structures, predictive models for two structural modifications, core-fucosylation and bisecting GlcNAc, were developed based on the GU library. Second, the applicability of this method for the analysis of complex biological samples is evidenced by the ability to discriminate between cell culture and tissue sample types by the normalized intensity of
N
-glycan structures alone. Overall, the methods and data described here are expected to support the future development of more automated approaches to glycan identification and quantitation.
Porous graphitized carbon (PGC) based chromatography achieves high-resolution separation of glycan structures released from glycoproteins.
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was ...originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.