Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based ...on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and ...MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines.
Tandem mass spectrometry (MS/MS) continues to be the technology of choice for high-throughput analysis of complex proteomics samples. While MS/MS spectra are commonly identified by matching against a ...database of known protein sequences, the complementary approach of spectral library searching against collections of reference spectra consistently outperforms sequence-based searches by resulting in significantly more identified spectra. However, while spectral library searches benefit from the advance knowledge of the expected peptide fragmentation patterns recorded in library spectra, estimation of the statistical significance of spectrum-spectrum matches (SSMs) continues to be hindered by difficulties in finding an appropriate definition of “random” SSMs to use as a null model when estimating the significance of true SSMs. We propose to avoid this problem by changing the null hypothesis: instead of determining the probability of observing a high SSM score between randomly matched spectra, we estimate the probability of observing a low SSM score between replicate spectra of the same molecule. To this end, we explicitly model the variation in instrument measurements of MS/MS peak intensities and show how these models can be used to determine a theoretical distribution of SSM scores between reference and query spectra of the same molecule. While the proposed spectral library generating function (SLGF) approach can be used to calculate theoretical distributions for any additive SSM score (e.g., any dot product), we further show how it can be used to calculate the distribution of expected cosines between reference and query spectra. We developed a spectral library search tool, Tremolo, and demonstrate that this SLGF-based search tool significantly outperforms current state-of-the-art spectral library search tools and provide a detailed discussion of the multiple reasons behind the observed differences in the sets of identified MS/MS spectra.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Access to web-based platforms has enabled scientists to perform research remotely. A critical aspect of mass spectrometry data analysis is the inspection, analysis, and visualization of the raw data ...to validate data quality and confirm statistical observations. We developed the GNPS Dashboard, a web-based data visualization tool, to facilitate synchronous collaborative inspection, visualization, and analysis of private and public mass spectrometry data remotely.
Urbanization along coastlines alters marine ecosystems including contributing molecules of anthropogenic origin to the coastal dissolved organic matter (DOM) pool. A broad assessment of the nature ...and extent of anthropogenic impacts on coastal ecosystems is urgently needed to inform regulatory guidelines and ecosystem management. Recently, non-targeted tandem mass spectrometry approaches are gaining momentum for the analysis of global organic matter composition (chemotypes) including a wide array of natural and anthropogenic compounds. In line with these efforts, we developed a non-targeted liquid chromatography tandem mass spectrometry (LC-MS/MS) workflow that utilizes advanced data analysis approaches such as feature-based molecular networking and repository-scale spectrum searches. This workflow allows the scalable comparison and mapping of seawater chemotypes from large-scale spatial surveys as well as molecular family level annotation of unknown compounds. As a case study, we visualized organic matter chemotype shifts in coastal environments in northern San Diego, USA, after notable rain fall in winter 2017/2018 and highlight potential anthropogenic impacts. The observed seawater chemotype, consisting of 4384 LC-MS/MS features, shifted significantly after a major rain event. Molecular drivers of this shift could be attributed to multiple anthropogenic compounds, including pesticides (Imazapyr and Isoxaben), cleaning products (Benzyl-tetradecyl-dimethylammonium) and chemical additives (Hexa (methoxymethyl)melamine) and potential degradation products. By expanding the search of identified xenobiotics to other public tandem mass spectrometry datasets, we further contextualized their possible origin and show their importance in other ecosystems. The mass spectrometry and data analysis pipelines applied here offer a scalable framework for future molecular mapping and monitoring of marine ecosystems, which will contribute to a deliberate assessment of how chemical pollution impacts our oceans.
Display omitted
•Feature-based Molecular Networking enables large-scale analysis of marine DOM.•Organic matter chemotype in coastal San Diego shifted significantly after rain.•Molecular drivers could be attributed to multiple anthropogenic compounds.•Spatial mapping highlighted different point sources as potential origin.•Repository-scale meta-analysis can further contextualize origin and importance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
There is a growing interest in unraveling the chemical complexity of our diets. To help the scientific community gain insight into the molecules present in foods and beverages that we ingest, we ...created foodMASST, a search tool for MS/MS spectra (of both known and unknown molecules) against a growing metabolomics food and beverage reference database. We envision foodMASST will become valuable for nutrition research and to assess the potential uniqueness of dietary biomarkers to represent specific foods or food classes.
Imagine a scenario where personal belongings such as pens, keys, phones, or handbags are found at an investigative site. It is often valuable to the investigative team that is trying to trace back ...the belongings to an individual to understand their personal habits, even when DNA evidence is also available. Here, we develop an approach to translate chemistries recovered from personal objects such as phones into a lifestyle sketch of the owner, using mass spectrometry and informatics approaches. Our results show that phones’ chemistries reflect a personalized lifestyle profile. The collective repertoire of molecules found on these objects provides a sketch of the lifestyle of an individual by highlighting the type of hygiene/beauty products the person uses, diet, medical status, and even the location where this person may have been. These findings introduce an additional form of trace evidence from skin-associated lifestyle chemicals found on personal belongings. Such information could help a criminal investigator narrowing down the owner of an object found at a crime scene, such as a suspect or missing person.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
Metabolomics has a long history of using cosine similarity to match experimental tandem mass spectra to databases for compound identification. Here we introduce the Blur-and-Link (BLINK) approach for ...scoring cosine similarity. By bypassing fragment alignment and simultaneously scoring all pairs of spectra using sparse matrix operations, BLINK is over 3000 times faster than MatchMS, a widely used loop-based alignment and scoring implementation. Using a similarity cutoff of 0.7, BLINK and MatchMS had practically equivalent identification agreement, and greater than 99% of their scores and matching ion counts were identical. This performance improvement can enable calculations to be performed that would typically be limited by time and available computational resources.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The popular trend of today’s music can be obtained by deep excavation, analysis, and prediction of the audience’s preferences. Using huge music library resources and user behavior to form music big ...data and truly realizing the aggregation of audience preferences determine the popular development trend of music. Therefore, this paper will apply data mining (DM) technology, introduce neural network (NNS) theory, establish a prediction model of music fashion trend, predict and evaluate the music fashion trend according to the selected evaluation index, find the change of music fashion trend in time, and provide decision-making basis for music fashion trend. In this paper, the prediction of music popularity trend based on NNS and DM technology is studied. In the prediction of the number of songs played by 10 artists, the NNS algorithm proposed in this paper reduces the prediction effect from the original 0.074 and 0.045 to 0.044 and 0.032, respectively, and the error rates are reduced by 35.7% and 29.4%, respectively, compared with the learning algorithm and the decision tree algorithm. Among the three methods, the NNS algorithm in this paper has the highest accuracy. Therefore, it can be proved that the model proposed in this paper is more suitable for predicting the trend of music popularity. In the end, it can accurately control the trend of pop music and also realize the aggregation of user preferences to determine the trend of pop music.
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, SIK, UILJ, UKNU, UL, UM, UPUK
Natural product screening programs have uncovered molecules from diverse natural sources with various biological activities and unique structures. However, much is yet underexplored and additional ...information is hidden in these exceptional collections. We applied untargeted mass spectrometry approaches to capture the chemical space and dispersal patterns of metabolites from an in-house library of marine cyanobacterial and algal collections. Remarkably, 86% of the metabolomics signals detected were not found in other available datasets of similar nature, supporting the hypothesis that marine cyanobacteria and algae possess distinctive metabolomes. The data were plotted onto a world map representing eight major sampling sites, and revealed potential geographic locations with high chemical diversity. We demonstrate the use of these inventories as a tool to explore the diversity and distribution of natural products. Finally, we utilized this tool to guide the isolation of a new cyclic lipopeptide, yuvalamide A, from a marine cyanobacterium.