Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning ...models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.
Scientific contribution
This pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries.
Graphical Abstract
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Microbial natural products are important for the understanding of microbial interactions, chemical defense and communication, and have also served as an inspirational source for numerous ...pharmaceutical drugs. Tropical marine cyanobacteria have been highlighted as a great source of new natural products, however, few reports have appeared wherein a multi-omics approach has been used to study their natural products potential (i.e., reports are often focused on an individual natural product and its biosynthesis). This study focuses on describing the natural product genetic potential as well as the expressed natural product molecules in benthic tropical cyanobacteria. We collected from several sites around the world and sequenced the genomes of 24 tropical filamentous marine cyanobacteria. The informatics program antiSMASH was used to annotate the major classes of gene clusters. BiG-SCAPE phylum-wide analysis revealed the most promising strains for natural product discovery among these cyanobacteria. LCMS/MS-based metabolomics highlighted the most abundant molecules and molecular classes among 10 of these marine cyanobacterial samples. We observed that despite many genes encoding for peptidic natural products, peptides were not as abundant as lipids and lipopeptides in the chemical extracts. Our results highlight a number of highly interesting biosynthetic gene clusters for genome mining among these cyanobacterial samples.
X-ray security inspection processes have a low degree of automation, long detection times, and are subject to misjudgment due to occlusion. To address these problems, this paper proposes a ...multi-objective intelligent recognition method for X-ray images based on the YOLO deep learning network and an optimized transformer structure (YOLO-T). We also construct the GDXray-Expanded X-ray detection dataset, which contains multiple types of dangerous goods. Using this dataset, we evaluated several versions of the YOLO deep learning network model and compared the results to those of the proposed YOLO-T model. The proposed YOLO-T method demonstrated higher accuracy for multitarget and hidden-target detection tasks. On the GDXray-Expanded dataset, the maximum mAP of the proposed YOLO-T model was 97.73%, which is 7.66%, 16.47%, and 7.11% higher than that obtained by the YOLO v2, YOLO v3, and YOLO v4 models, respectively. Thus, we believe that the proposed YOLO-T network has good application prospects in X-ray security inspection technologies. In all kinds of security detection scenarios using X-ray security detectors, the model proposed in this paper can quickly and accurately identify dangerous goods, which has broad application value.
The identification of molecular structure is essential for understanding chemical diversity and for developing drug leads from small molecules. Nevertheless, the structure elucidation of small ...molecules by Nuclear Magnetic Resonance (NMR) experiments is often a long and non-trivial process that relies on years of training. To achieve this process efficiently, several spectral databases have been established to retrieve reference NMR spectra. However, the number of reference NMR spectra available is limited and has mostly facilitated annotation of commercially available derivatives. Here, we introduce DeepSAT, a neural network-based structure annotation and scaffold prediction system that directly extracts the chemical features associated with molecular structures from their NMR spectra. Using only the
1
H-
13
C HSQC spectrum, DeepSAT identifies related known compounds and thus efficiently assists in the identification of molecular structures. DeepSAT is expected to accelerate chemical and biomedical research by accelerating the identification of molecular structures.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Background
To explore the prognostic value of the fibrinogen–albumin ratio (FAR) combined with sarcopenia in intrahepatic cholangiocarcinoma (ICC) patients after surgery and to develop a nomogram for ...predicting the survival of ICC patients.
Materials and Methods
In this prospective cohort study, 116 ICC patients who underwent radical surgery were enrolled as the discovery cohort and another independent cohort of 68 ICC patients was used as the validation cohort. Kaplan–Meier method was used to analyze prognosis. The independent predictor of overall survival (OS) and recurrence‐free survival (RFS) was evaluated by univariable and multivariable Cox regression analyses, then developing nomograms. The performance of nomograms was evaluated by concordance index (C‐index), calibration curve, receiver operating characteristic curve analysis (ROC), and decision curve analysis (DCA).
Results
Patients with high FAR had lower OS and RFS. FAR and sarcopenia were effective predictors of OS and RFS. Patients with high FAR and sarcopenia had a poorer prognosis than other patients. OS nomogram was constructed based on age, FAR, and sarcopenia. RFS nomogram was constructed based on FAR and sarcopenia. C‐index for the nomograms of OS and RFS was 0.713 and 0.686. Calibration curves revealed great consistency between actual survival and nomogram prediction. The area under ROC curve (AUC) for the nomograms of OS and RFS was 0.796 and 0.791 in the discovery cohort, 0.823 and 0.726 in the validation cohort. The clinical value of nomograms was confirmed by the DCA.
Conclusions
ICC patients with high FAR and sarcopenia had a poor prognosis, the nomograms developed based on these two factors were accurate and clinically useful in ICC patients who underwent radical resection.
Sarcopenia and high levels of FAR are related to the poor prognosis of ICC patients undergoing radical surgery. FAR and sarcopenia are convenient, inexpensive, and reliable marks that provide references for improving the prognosis and new treatment strategies for ICC patients after surgery.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
The antimalarial drug-resistance conundrum which threatens to reverse the great strides taken to curb the malaria scourge warrants an urgent need to find novel chemical scaffolds to serve as ...templates for the development of new antimalarial drugs. Plants represent a viable alternative source for the discovery of unique potential antiplasmodial chemical scaffolds. To expedite the discovery of new antiplasmodial compounds from plants, the aim of this study was to use phylogenetic analysis to identify higher plant orders and families that can be rationally prioritised for antimalarial drug discovery. We queried the PubMed database for publications documenting antiplasmodial properties of natural compounds isolated from higher plants. Thereafter, we manually collated compounds reported along with plant species of origin and relevant pharmacological data. We systematically assigned antiplasmodial-associated plant species into recognised families and orders, and then computed the resistance index, selectivity index and physicochemical properties of the compounds from each taxonomic group. Correlating the generated phylogenetic trees and the biological data of each clade allowed for the identification of 3 ‘hot’ plant orders and families. The top 3 ranked plant orders were the (i) Caryophyllales, (ii) Buxales, and (iii) Chloranthales. The top 3 ranked plant families were the (i) Ancistrocladaceae, (ii) Simaroubaceae, and (iii) Buxaceae. The highly active natural compounds (IC
50
≤ 1 µM) isolated from these plant orders and families are structurally unique to the ‘legacy’ antimalarial drugs. Our study was able to identify the most prolific taxa at order and family rank that we propose be prioritised in the search for potent, safe and drug-like antimalarial molecules.
The emergence and spread of drug-recalcitrant
Plasmodium falciparum
parasites threaten to reverse the gains made in the fight against malaria. Urgent measures need to be taken to curb this impending ...challenge. The higher plant-derived sesquiterpene, quinoline alkaloids, and naphthoquinone natural product classes of compounds have previously served as phenomenal chemical scaffolds from which integral antimalarial drugs were developed. Historical successes serve as an inspiration for the continued investigation of plant-derived natural products compounds in search of novel molecular templates from which new antimalarial drugs could be developed. The aim of this study was to identify potential chemical scaffolds for malaria drug discovery following analysis of historical data on phytochemicals screened in vitro against
P. falciparum
. To identify these novel scaffolds, we queried an in-house manually curated database of plant-derived natural product compounds and their in vitro biological data. Natural products were assigned to different structural classes using NPClassifier. To identify the most promising chemical scaffolds, we then correlated natural compound class with bioactivity and other data, namely (i) potency, (ii) resistance index, (iii) selectivity index and (iv) physicochemical properties. We used an unbiased scoring system to rank the different natural product classes based on the assessment of their bioactivity data. From this analysis we identified the top-ranked natural product pathway as the alkaloids. The top three ranked super classes identified were (i) pseudoalkaloids, (ii) naphthalenes and (iii) tyrosine alkaloids and the top five ranked classes (i) quassinoids (of super class triterpenoids), (ii) steroidal alkaloids (of super class pseudoalkaloids) (iii) cycloeudesmane sesquiterpenoids (of super class triterpenoids) (iv) isoquinoline alkaloids (of super class tyrosine alkaloids) and (v) naphthoquinones (of super class naphthalenes). Launched chemical space of these identified classes of compounds was, by and large, distinct from that of ‘legacy’ antimalarial drugs. Our study was able to identify chemical scaffolds with acceptable biological properties that are structurally different from current and previously used antimalarial drugs. These molecules have the potential to be developed into new antimalarial drugs.
Soil microorganisms coexist and interact showing antagonistic or mutualistic behaviors. Here, we show that an environmental strain of Bacillus subtilis undergoes heritable phenotypic variation upon ...interaction with the soil fungal pathogen Setophoma terrestris (ST). Metabolomics analysis revealed differential profiles in B. subtilis before (pre-ST) and after (post-ST) interacting with the fungus, which paradoxically involved the absence of lipopeptides surfactin and plipastatin and yet acquisition of antifungal activity in post-ST variants. The profile of volatile compounds showed that 2-heptanone and 2-octanone were the most discriminating metabolites present at higher concentrations in post-ST during the interaction process. Both ketones showed strong antifungal activity, which was lost with the addition of exogenous surfactin. Whole-genome analyses indicate that mutations in ComQPXA quorum-sensing system, constituted the genetic bases of post-ST conversion, which rewired B. subtilis metabolism towards the depletion of surfactins and the production of antifungal compounds during its antagonistic interaction with S. terrestris.
Background
Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass ...spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments.
Aim of review
We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries.
Key scientific concepts of review
This review focuses on the current state of spectral libraries for untargeted LC–MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.