During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no ...well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets.
http://bioinf.spbau.ru/metaquast
aleksey.gurevich@spbu.ru
Supplementary data are available at Bioinformatics online.
Abstract
Summary
Currently, most genome assembly projects focus on contigs and scaffolds rather than assembly graphs that provide a more comprehensive representation of an assembly. Since interactive ...visualization of large assembly graphs remains an open problem, we developed an Assembly Graph Browser (AGB) tool that visualizes large assembly graphs, extending the functionality of previously developed visualization approaches. Assembly Graph Browser includes a number of novel functions including repeat analysis, construction of the contracted assembly graphs (i.e. the graphs obtained by collapsing a selected set of edges) and a new approach to visualizing large assembly graphs.
Availability and implementation
http://www.github.com/almiheenko/AGB.
Supplementary information
Supplementary data are available at Bioinformatics online.
Natural products have traditionally been rich sources for drug discovery. In order to clear the road toward the discovery of unknown natural products, biologists need dereplication strategies that ...identify known ones. Here we report DEREPLICATOR+, an algorithm that improves on the previous approaches for identifying peptidic natural products, and extends them for identification of polyketides, terpenes, benzenoids, alkaloids, flavonoids, and other classes of natural products. We show that DEREPLICATOR+ can search all spectra in the recently launched Global Natural Products Social molecular network and identify an order of magnitude more natural products than previous dereplication efforts. We further demonstrate that DEREPLICATOR+ enables cross-validation of genome-mining and peptidogenomics/glycogenomics results.
The complete assembly of each human chromosome is essential for understanding human biology and evolution
. Here we use complementary long-read sequencing technologies to complete the linear assembly ...of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Peptidic natural products (PNPs) are widely used compounds that include many antibiotics and a variety of other bioactive peptides. Although recent breakthroughs in PNP discovery raised the challenge ...of developing new algorithms for their analysis, identification of PNPs via database search of tandem mass spectra remains an open problem. To address this problem, natural product researchers use dereplication strategies that identify known PNPs and lead to the discovery of new ones, even in cases when the reference spectra are not present in existing spectral libraries. DEREPLICATOR is a new dereplication algorithm that enables high-throughput PNP identification and that is compatible with large-scale mass-spectrometry-based screening platforms for natural product discovery. After searching nearly one hundred million tandem mass spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure, DEREPLICATOR identified an order of magnitude more PNPs (and their new variants) than any previous dereplication efforts.
De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is ...usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data.
In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data.
To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.
Peptidic natural products (PNPs) represent a medically important class of secondary metabolites that includes antibiotics, anti-inflammatory and antitumor agents. Advances in tandem mass spectra ...(MS/MS) acquisition and in silico database search methods have enabled high-throughput PNP discovery. However, the resulting spectra annotations are often error-prone and their validation remains a bottleneck. Here, we present NPvis, a visualizer suitable for the evaluation of PNP–MS/MS matches. The tool interactively maps annotated spectrum peaks to the corresponding PNP fragments and allows researchers to assess the match correctness. NPvis accounts for the wide chemical diversity of PNPs that prevents the use of the existing proteomics visualizers. Moreover, NPvis works even if the exact chemical structure of the matching PNP is unknown. The tool is available online and as a standalone application. We hope that it will benefit the community by streamlining PNP data analysis and validation.
Peptidic natural products (PNPs) include many antibiotics and other bioactive compounds. While the recent launch of the Global Natural Products Social (GNPS) molecular networking infrastructure is ...transforming PNP discovery into a high-throughput technology, PNP identification algorithms are needed to realize the potential of the GNPS project. GNPS relies on the assumption that each connected component of a molecular network (representing related metabolites) illuminates the 'dark matter of metabolomics' as long as it contains a known metabolite present in a database. We reveal a surprising diversity of PNPs produced by related bacteria and show that, contrary to the 'comparative metabolomics' assumption, two related bacteria are unlikely to produce identical PNPs (even though they are likely to produce similar PNPs). Since this observation undermines the utility of GNPS, we developed a PNP identification tool, VarQuest, that illuminates the connected components in a molecular network even if they do not contain known PNPs and only contain their variants. VarQuest reveals an order of magnitude more PNP variants than all previous PNP discovery efforts and demonstrates that GNPS already contains spectra from 41% of the currently known PNP families. The enormous diversity of PNPs suggests that biosynthetic gene clusters in various microorganisms constantly evolve to generate a unique spectrum of PNP variants that differ from PNPs in other species.
Abstract
Motivation
The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological ...advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes.
Results
In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.
Availability and implementation
http://cab.spbu.ru/software/quast-lg
Supplementary information
Supplementary data are available at Bioinformatics online.
Ovarian cancer (OC) is the most lethal of gynecologic tumors because women generally present with advanced stage disease. Platinum-based chemotherapy play a pivotal role in OC treatment. The aim of ...the study was to assess the effects of CAP-regimen chemotherapy on blood redox status in patients with ovarian cancer.
Patients were 132 women with primary OC with FIGO stage III-IV. Patients were examined before treatment, 3 and 14 days after the first and 3 and 14 days after the second course of CAP-regimen chemotherapy. The activity of antioxidant enzymes, the intensity of lipid peroxidation, the level of oxidative modification of proteins (OMP) were evaluated.
We have found elevated levels of OMP products in plasma and erythrocytes of patients with ovarian cancer in comparison with donors. Our results suggest activation of lipid peroxidation in plasma and erythrocytes of ovarian cancer patients in comparison with healthy women. Sensitive and specific indicators of oxidative stress are levels of glutathione-S-transferase (sensitivity 80%, specificity 89%), diene conjugates (sensitivity 85,71%, specificity 72,73%) and OMP 430 nm (sensitivity 80%, specificity 90%) in blood plasma and the activity of catalase in erythrocytes (sensitivity 100%, specificity 89%).
CAP chemotherapy in patients with ovarian cancer with FIGO stage III-IV induces radical formation and changes the homeostasis of the patient. The lipid peroxidation and antioxidant system in plasma move to a higher level of functioning, and the erythrocytes develop oxidative stress.