The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry ...proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components.We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification ...system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Interrelating small molecules according to their aligned fragmentation spectra is central to tandem mass spectrometry-based untargeted metabolomics. Current alignment algorithms do not provide ...statistical significance and compounds that have multiple delocalized structural differences and therefore often fail to have their fragment ions aligned. Here we align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE). SIMILE yields spectral alignment inferred structural connections in molecular networks that are not found with cosine-based scoring algorithms. In addition, it is now possible to rank spectral alignments based on p-values in the exploration of structural relationships between compounds and enhance the chemical connectivity that can be obtained with molecular networking.
It is a common problem in natural product therapeutic lead discovery programs that despite good bioassay results in the initial extract, the active compound(s) may not be isolated during subsequent ...bioassay-guided purification. Herein, we present the concept of bioactive molecular networking to find candidate active molecules directly from fractionated bioactive extracts. By employing tandem mass spectrometry, it is possible to accelerate the dereplication of molecules using molecular networking prior to subsequent isolation of the compounds, and it is also possible to expose potentially bioactive molecules using bioactivity score prediction. Indeed, bioactivity score prediction can be calculated with the relative abundance of a molecule in fractions and the bioactivity level of each fraction. For that reason, we have developed a bioinformatic workflow able to map bioactivity score in molecular networks and applied it for discovery of antiviral compounds from a previously investigated extract of Euphorbia dendroides where the bioactive candidate molecules were not discovered following a classical bioassay-guided fractionation procedure. It can be expected that this approach will be implemented as a systematic strategy, not only in current and future bioactive lead discovery from natural extract collections but also for the reinvestigation of the untapped reservoir of bioactive analogues in previous bioassay-guided fractionation efforts.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the ...field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from -92 up to +5705%) when compared with a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science.
This report describes the first application of the novel NMR-based machine learning tool “Small Molecule Accurate Recognition Technology” (SMART 2.0) for mixture analysis and subsequent accelerated ...discovery and characterization of new natural products. The concept was applied to the extract of a filamentous marine cyanobacterium known to be a prolific producer of cytotoxic natural products. This environmental Symploca extract was roughly fractionated, and then prioritized and guided by cancer cell cytotoxicity, NMR-based SMART 2.0, and MS2-based molecular networking. This led to the isolation and rapid identification of a new chimeric swinholide-like macrolide, symplocolide A, as well as the annotation of swinholide A, samholides A–I, and several new derivatives. The planar structure of symplocolide A was confirmed to be a structural hybrid between swinholide A and luminaolide B by 1D/2D NMR and LC-MS2 analysis. A second example applies SMART 2.0 to the characterization of structurally novel cyclic peptides, and compares this approach to the recently appearing “atomic sort” method. This study exemplifies the revolutionary potential of combined traditional and deep learning-assisted analytical approaches to overcome longstanding challenges in natural products drug discovery.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural ...networks (https://github.com/biocore/mmvec) to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.
Molecular cartography of the human skin surface in 3D Bouslimani, Amina; Porto, Carla; Rath, Christopher M. ...
Proceedings of the National Academy of Sciences - PNAS,
04/2015, Volume:
112, Issue:
17
Journal Article
Peer reviewed
Open access
Significance The paper describes the implementation of an approach to study the chemical makeup of human skin surface and correlate it to the microbes that live in the skin. We provide the ...translation of molecular information in high-spatial resolution 3D to understand the body distribution of skin molecules and bacteria. In addition, we use integrative analysis to interpret, at a molecular level, the large scale of data obtained from human skin samples. Correlations between molecules and microbes can be obtained to further gain insights into the chemical milieu in which these different microbial communities live.
The human skin is an organ with a surface area of 1.5–2 m ² that provides our interface with the environment. The molecular composition of this organ is derived from host cells, microbiota, and external molecules. The chemical makeup of the skin surface is largely undefined. Here we advance the technologies needed to explore the topographical distribution of skin molecules, using 3D mapping of mass spectrometry data and microbial 16S rRNA amplicon sequences. Our 3D maps reveal that the molecular composition of skin has diverse distributions and that the composition is defined not only by skin cells and microbes but also by our daily routines, including the application of hygiene products. The technological development of these maps lays a foundation for studying the spatial relationships of human skin with hygiene, the microbiota, and environment, with potential for developing predictive models of skin phenotypes tailored to individual health.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural ...annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously ...aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries.
Display omitted
•Reprocessed 31 TB of human proteomics data•MassIVE-KB spectral library including 2.1 million precursors (>4-fold increase)•55% of all human proteome amino acids are covered (2-fold increase)•430 new proteins observed with previously missing proteomics evidence
Wang et al. introduce MassIVE-KB, a program designed to distill the entire community’s mass spectrometry data into reusable spectral library resources. As a result, the statistically-significant discovery of a peptide or protein in a single researcher’s data will thus be made available to the whole community to support its identification (in shotgun experiments) or quantitative detection (in targeted experiments) in all future analyses.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP