MSstats is an R package for statistical relative quantification of proteins and peptides in mass spectrometry-based proteomics. Version 2.0 of MSstats supports label-free and label-based experimental ...workflows and data-dependent, targeted and data-independent spectral acquisition. It takes as input identified and quantified spectral peaks, and outputs a list of differentially abundant peptides or proteins, or summaries of peptide or protein relative abundance. MSstats relies on a flexible family of linear mixed models.
The code, the documentation and example datasets are available open-source at www.msstats.org under the Artistic-2.0 license. The package can be downloaded from www.msstats.org or from Bioconductor www.bioconductor.org and used in an R command line workflow. The package can also be accessed as an external tool in Skyline (Broudy et al., 2014) and used via graphical user interface.
The analysis of the large amount of data generated in mass spectrometry-based proteomics experiments represents a significant challenge and is currently a bottleneck in many proteomics projects. In ...this review we discuss critical issues related to data processing and analysis in proteomics and describe available methods and tools. We place special emphasis on the elaboration of results that are supported by sound statistical arguments.
Cardinal is an R package for statistical analysis of mass spectrometry-based imaging (MSI) experiments of biological samples such as tissues. Cardinal supports both Matrix-Assisted Laser ...Desorption/Ionization (MALDI) and Desorption Electrospray Ionization-based MSI workflows, and experiments with multiple tissues and complex designs. The main analytical functionalities include (1) image segmentation, which partitions a tissue into regions of homogeneous chemical composition, selects the number of segments and the subset of informative ions, and characterizes the associated uncertainty and (2) image classification, which assigns locations on the tissue to pre-defined classes, selects the subset of informative ions, and estimates the resulting classification error by (cross-) validation. The statistical methods are based on mixture modeling and regularization.
A major goal of proteomics research is the accurate and sensitive identification and quantification of a broad range of proteins within a sample. Data-independent acquisition (DIA) approaches that ...acquire MS/MS spectra independently of precursor information have been developed to overcome the reproducibility challenges of data-dependent acquisition and the limited breadth of targeted proteomics strategies. Typical DIA implementations use wide MS/MS isolation windows to acquire comprehensive fragment ion data. However, wide isolation windows produce highly chimeric spectra, limiting the achievable sensitivity and accuracy of quantification and identification. Here, we present a DIA strategy in which spectra are collected with overlapping (rather than adjacent or random) windows and then computationally demultiplexed. This approach improves precursor selectivity by nearly a factor of 2, without incurring any loss in mass range, mass resolution, chromatographic resolution, scan speed, or other key acquisition parameters. We demonstrate a 64% improvement in sensitivity and a 17% improvement in peptides detected in a 6-protein bovine mix spiked into a yeast background. To confirm the method’s applicability to a realistic biological experiment, we also analyze the regulation of the proteasome in yeast grown in rapamycin and show that DIA experiments with overlapping windows can help elucidate its adaptation toward the degradation of oxidatively damaged proteins. Our integrated computational and experimental DIA strategy is compatible with any DIA-capable instrument. The computational demultiplexing algorithm required to analyze the data has been made available as part of the open-source proteomics software tools Skyline and msconvert (Proteowizard), making it easy to apply as part of standard proteomics workflows.
Graphical Abstract
Benchmarking comes of age Robinson, Mark D; Vitek, Olga
Genome Biology,
10/2019, Letnik:
20, Številka:
1
Journal Article
Recenzirano
Odprti dostop
References 1. 1. Shah N, Nute MG, Warnow T, Pop M. Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows. Bioinformatics. 2019;35(9):1613–4. * PubMed * Google ...Scholar 2. 2. Heinze G, Dunkler D. Five myths about variable selection. Transpl Int. 2017;30(1):6–10. * PubMed * Google Scholar 3. 3. Makridakis S, Spiliotis E, Assimakopoulos V. Statistical and machine learning forecasting methods: concerns and ways forward. PLoS One. 2018;13(3):e0194889. * PubMed * PubMed Central * Google Scholar 4. 4. Jelizarow M, Guillemot V, Tenenhaus A, Strimmer K, Boulesteix A-L. Over-optimism in bioinformatics: an illustration. Bioinformatics. 2010;26(16):1990–8. * CAS * PubMed * Google Scholar 5. 5. Peters B, Brenner SE, Wang E, Slonim D, Kann MG. Putting benchmarks in their rightful place: the heart of computational biology. PLoS Comput Biol. 2018;14(11):e1006494. * PubMed * PubMed Central * Google Scholar 6. 6. Mangul S, Martin LS, Hill BL, Lam AK-M, Distler MG, Zelikovsky A, et al. Systematic benchmarking of omics computational tools. Nat Commun 2019;10(1):1393. * PubMed * PubMed Central * Google Scholar 7. 7. Hulsen T, Huynen MA, de Vlieg J, Groenen PMA. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 2006;7(4):R31. * PubMed * PubMed Central * Google Scholar 8. 8. Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 2013;14(9):R95. * PubMed * PubMed Central * Google Scholar 9. 9. Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019;20(1):117. * PubMed * PubMed Central * Google Scholar 10. 10. Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 2019;20(1):194. * PubMed * PubMed Central * Google Scholar 11. 11. Zielezinski A, Girgis HZ, Bernard G, Leimeister C-A, Tang K, Dencker T, et al. Benchmarking of alignment-free sequence comparison methods. Genome Biol 2019;20(1):144. * PubMed * PubMed Central * Google Scholar 12. 12. Mangul S, Martin LS, Eskin E, Blekhman R. Improving the usability and archival stability of bioinformatics software. Genome Biol 2019;20(1):47. * PubMed * PubMed Central * Google Scholar 13. 13. Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, et al. Essential guidelines for computational method benchmarking. Genome Biol 2019;20(1):125. * PubMed * PubMed Central * Google Scholar 14. 14. Korthauer K, Kimes PK, Duvallet C, Reyes A, Subramanian A, Teng M, et al. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 2019;20(1):118. * PubMed * PubMed Central * Google Scholar 15. 15. Vlachos C, Burny C, Pelizzola M, Borges R, Futschik A, Kofler R, et al. Benchmarking software tools for detecting and quantifying selection in evolve and resequencing studies. Genome Biol 2019;20(1):169. * PubMed * PubMed Central * Google Scholar 16. 16. Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019;20(1):129. * PubMed * PubMed Central * Google Scholar 17. 17. Mendoza SN, Olivier BG, Molenaar D, Teusink B. A systematic assessment of current genome-scale metabolic reconstruction tools. Genome Biol 2019;20(1):158. * PubMed * PubMed Central * Google Scholar 18. 18. Zappia L, Phipson B, Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 2018;14(6):e1006245. * PubMed * PubMed Central * Google Scholar 19. 19. Angelo Duò MDR, Soneson C. Plot performance summaries Internet. 2019 cited 2019 Sep 24. Available from: https://bioconductor.org/packages/release/data/experiment/vignettes/DuoClustering2018/inst/doc/plot_performance.html 20. 20. Ellrott K, Buchanan A, Creason A, Mason M, Schaffter T, Hoff B, et al. Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges. Genome Biol 2019 Sep 10;20(1):195. * PubMed * PubMed Central * Google Scholar Download references
The distances between the MS/MS peaks are used to infer the amino acid sequence of the parent MS peak. Since abundant MS1 peaks are more likely to be selected for fragmentation, relative peptide ...quantification can also be achieved by counting the number of identified MS/MS spectra. The moderate correlation of transcript and protein abundance indicates a major role of post-translational regulation in the activity of the cell. ...the best functional insight can be obtained by combining measurements across technologies, and searching for broader groups of genes, proteins, and metabolites forming regulatory relationships 86, 87. To date, only 65% of all predicted human proteins have been reliably observed by mass spectrometry 90. ...future experimental developments will focus on improving the sensitivity, reproducibility, and comprehensiveness of protein identifications, and the sensitivity and accuracy of quantification.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and ...quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs.
We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background.
We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Targeted mass spectrometry by selected reaction monitoring (S/MRM) has proven to be a suitable technique for the consistent and reproducible quantification of proteins across multiple biological ...samples and a wide dynamic range. This performance profile is an important prerequisite for systems biology and biomedical research. However, the method is limited to the measurements of a few hundred peptides per LC-MS analysis. Recently, we introduced SWATH-MS, a combination of data independent acquisition and targeted data analysis that vastly extends the number of peptides/proteins quantified per sample, while maintaining the favorable performance profile of S/MRM. Here we applied the SWATH-MS technique to quantify changes over time in a large fraction of the proteome expressed in Saccharomyces cerevisiae in response to osmotic stress.
We sampled cell cultures in biological triplicates at six time points following the application of osmotic stress and acquired single injection data independent acquisition data sets on a high-resolution 5600 tripleTOF instrument operated in SWATH mode. Proteins were quantified by the targeted extraction and integration of transition signal groups from the SWATH-MS datasets for peptides that are proteotypic for specific yeast proteins. We consistently identified and quantified more than 15,000 peptides and 2500 proteins across the 18 samples. We demonstrate high reproducibility between technical and biological replicates across all time points and protein abundances. In addition, we show that the abundance of hundreds of proteins was significantly regulated upon osmotic shock, and pathway enrichment analysis revealed that the proteins reacting to osmotic shock are mainly involved in the carbohydrate and amino acid metabolism. Overall, this study demonstrates the ability of SWATH-MS to efficiently generate reproducible, consistent, and quantitatively accurate measurements of a large fraction of a proteome across multiple samples.
PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" ...of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK