Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a ...recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website.
http://bioinf.spbau.ru/quast .
Supplementary data are available at Bioinformatics online.
Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred ...bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.
We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.
The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem. The ...development of algorithmic ideas for next-generation sequencing is examined.
Genome Rearrangements In Man and Mouse (GRIMM) is a tool for analyzing rearrangements of gene orders in pairs of unichromosomal and multichromosomal genomes, with either signed or unsigned gene data. ...Although there are several programs for analyzing rearrangements in unichromosomal genomes, this is the first to analyze rearrangements in multichromosomal genomes. GRIMM also provides a new algorithm for analyzing comparative maps for which gene directions are unknown. Availability: A web server, with instructions and sample data, is available at http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM. Contact: gptesler@ucsd.edu
The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement ...gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
The human and mouse genomic sequences provide evidence for a larger number of rearrangements than previously thought and reveal extensive reuse of breakpoints from the same short fragile regions. ...Breakpoint clustering in regions implicated in cancer and infertility have been reported in previous studies; we report here on breakpoint clustering in chromosome evolution. This clustering reveals limitations of the widely accepted random breakage theory that has remained unchallenged since the mid-1980s. The genome rearrangement analysis of the human and mouse genomes implies the existence of a large number of very short "hidden" synteny blocks that were invisible in the comparative mapping data and ignored in the random breakage model. These blocks are defined by closely located breakpoints and are often hard to detect. Our results suggest a model of chromosome evolution that postulates that mammalian genomes are mosaics of fragile regions with high propensity for rearrangements and solid regions with low propensity for rearrangements.
Although analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of ...genomic architectures. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they reveal that a large number of microrearrangements is required to explain the differences in draft human and mouse sequences. Here we describe a new algorithm for constructing synteny blocks, study arrangements of synteny blocks in human and mouse, derive a most parsimonious human-mouse rearrangement scenario, and provide evidence that intrachromosomal rearrangements are more frequent than interchromosomal rearrangements. Our analysis is based on the human-mouse breakpoint graph, which reveals related breakpoints and allows one to find a most parsimonious scenario. Because these graphs provide important insights into rearrangement scenarios, we introduce a new visualization tool that allows one to view breakpoint graphs superimposed with genomic dot-plots.
The “dark matter of life” describes microbes and even entire divisions of bacterial phyla that have evaded cultivation and have yet to be sequenced. We present a genome from the globally distributed ...but elusive candidate phylum TM6 and uncover its metabolic potential. TM6 was detected in a biofilm from a sink drain within a hospital restroom by analyzing cells using a highly automated single-cell genomics platform. We developed an approach for increasing throughput and effectively improving the likelihood of sampling rare events based on forming small random pools of single-flow–sorted cells, amplifying their DNA by multiple displacement amplification and sequencing all cells in the pool, creating a “mini-metagenome.” A recently developed single-cell assembler, SPAdes, in combination with contig binning methods, allowed the reconstruction of genomes from these mini-metagenomes. A total of 1.07 Mb was recovered in seven contigs for this member of TM6 (JCVI TM6SC1), estimated to represent 90% of its genome. High nucleotide identity between a total of three TM6 genome drafts generated from pools that were independently captured, amplified, and assembled provided strong confirmation of a correct genomic sequence. TM6 is likely a Gram-negative organism and possibly a symbiont of an unknown host (nonfree living) in part based on its small genome, low-GC content, and lack of biosynthesis pathways for most amino acids and vitamins. Phylogenomic analysis of conserved single-copy genes confirms that TM6SC1 is a deeply branching phylum.
Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, ...however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.