Akademska digitalna zbirka SLovenije - logo
E-resources
Peer reviewed Open access
  • Assembling the Community-Sc...
    Wang, Mingxun; Wang, Jian; Carver, Jeremy; Pullman, Benjamin S.; Cha, Seong Won; Bandeira, Nuno

    Cell systems, 10/2018, Volume: 7, Issue: 4
    Journal Article

    The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries. Display omitted •Reprocessed 31 TB of human proteomics data•MassIVE-KB spectral library including 2.1 million precursors (>4-fold increase)•55% of all human proteome amino acids are covered (2-fold increase)•430 new proteins observed with previously missing proteomics evidence Wang et al. introduce MassIVE-KB, a program designed to distill the entire community’s mass spectrometry data into reusable spectral library resources. As a result, the statistically-significant discovery of a peptide or protein in a single researcher’s data will thus be made available to the whole community to support its identification (in shotgun experiments) or quantitative detection (in targeted experiments) in all future analyses.