The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality ...assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping ...algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.
The complete sequence of a human genome Nurk, Sergey; Koren, Sergey; Rhie, Arang ...
Science (American Association for the Advancement of Science),
04/2022, Volume:
376, Issue:
6588
Journal Article
Peer reviewed
Open access
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining ...8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Genome stability in radioresistant bacterium Deinococcus radiodurans depends on RecA, the main bacterial recombinase. Without RecA, gross genome rearrangements occur during repair of DNA ...double-strand breaks. Long repeated (insertion) sequences have been identified as hot spots for ectopic recombination leading to genome rearrangements, and single-strand annealing (SSA) postulated to be the most likely mechanism involved in this process. Here, we have sequenced five isolates of D. radiodurans recA mutant carrying gross genome rearrangements to precisely characterize the rearrangements and to elucidate the underlying repair mechanism. The detected rearrangements consisted of large deletions in chromosome II in all the sequenced recA isolates. The mechanism behind these deletions clearly differs from the classical SSA; it utilized short (4-11 bp) repeats as opposed to insertion sequences or other long repeats. Moreover, it worked over larger linear DNA distances from those previously tested. Our data are most compatible with alternative end-joining, a recombination mechanism that operates in eukaryotes, but is also found in Escherichia coli. Additionally, despite the recA isolates being preselected for different rearrangement patterns, all identified deletions were found to overlap in a 35 kb genomic region. We weigh the evidence for mechanistic vs. adaptive reasons for this phenomenon.
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for ...only a few non-microbial species
. To address this issue, the international Genome 10K (G10K) consortium
has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on ...long and error prone nanopore reads.
We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.
https://github.com/kkrizanovic/NanoMark
mile.sikic@fer.hr
Supplementary data are available at Bioinformatics online.
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental ...duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
This review discusses structure-property modeling applications of a novel variant of the Randic connectivity index that is called the sum-connectivity index. We compare published one-descriptor ...quantitative structure-property relationship (QSPR) models obtained with the new sum-connectivity index and with the Randic connectivity index, called here the product-connectivity index. Additionally, the efficiency of both variants of connectivity indices in QSPR modeling is tested on five datasets of alkanes and two datasets of polycyclic hydrocarbons. Several physicochemical properties of alkanes (i.e. boiling and melting points, retention index, molar volume, molar refraction, heat of vaporization, standard Gibbs energy of formation, critical temperature, critical pressure, surface tension, density) and π- electronic energies of two sets of polycyclic hydrocarbons were correlated with the product- and sum-connectivity indices. A comparison of these QSPR models shows that both variants of connectivity indices are equivalent, and only slightly (but not significantly) better results are obtained with the sum-connectivity index. Inter-correlations between the product- and sum-connectivity indices are mostly linear with a slope very close to 1.0 for alkanes, and with a slope more different from 1.0 (0.88) for polycyclic compounds. The comparative analysis presented here supports the use of the sumconnectivity index in QSPR/QSAR studies together with the product-connectivity index. Further studies on larger and more heterogeneous datasets should test the sum-connectivity index in QSPR/QSAR models.
In this paper, a novel multi-resolution real-time 3D thermal imaging system as potential solution for a human body 3D thermal models standardisation is presented. The system consists of a ...high-resolution offline 3D scanner and a real-time low-resolution 3D scanner, both of them paired together with a thermal imaging camera. The emphasis of this paper is the presentation of the novel concept of the standardisation of human body 3D thermal models, captured by the multi-resolution real-time 3D thermal imaging system. The standardisation procedure utilises skeleton detection, skeleton transformation, mesh optimisation and texture mapping. The presented concept enables novel and practical methods for human body 3D thermal models comparison and analysis.
Abstract
Pigeons and doves (family Columbidae) are one of the most diverse extant avian lineages, and many species have served as key models for evolutionary genomics, developmental biology, ...physiology, and behavioral studies. Building genomic resources for columbids is essential to further many of these studies. Here, we present high-quality genome assemblies and annotations for 2 columbid species, Columba livia and Columba guinea. We simultaneously assembled C. livia and C. guinea genomes from long-read sequencing of a single F1 hybrid individual. The new C. livia genome assembly (Cliv_3) shows improved completeness and contiguity relative to Cliv_2.1, with an annotation incorporating long-read IsoSeq data for more accurate gene models. Intensive selective breeding of C. livia has given rise to hundreds of breeds with diverse morphological and behavioral characteristics, and Cliv_3 offers improved tools for mapping the genomic architecture of interesting traits. The C. guinea genome assembly is the first for this species and is a new resource for avian comparative genomics. Together, these assemblies and annotations provide improved resources for functional studies of columbids and avian comparative genomics in general.