Abstract
Motivation
Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently ...corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors.
Results
When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy.
Availability and implementation
NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish.
Supplementary information
Supplementary data are available at Bioinformatics online.
Mitochondrial genome (mitogenome) plays important roles in evolutionary and ecological studies. It becomes routine to utilize multiple genes on mitogenome or the entire mitogenomes to investigate ...phylogeny and biodiversity of focal groups with the onset of High Throughput Sequencing (HTS) technologies. We developed a mitogenome toolkit MitoZ, consisting of independent modules of de novo assembly, findMitoScaf (find Mitochondrial Scaffolds), annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads. We evaluated its performance using a total of 50 samples of which mitogenomes are publicly available. The results showed that MitoZ can recover more full-length mitogenomes with higher accuracy compared to the other available mitogenome assemblers. Overall, MitoZ provides a one-click solution to construct the annotated mitogenome from HTS raw data and will facilitate large scale ecological and evolutionary studies. MitoZ is free open source software distributed under GPLv3 license and available at https://github.com/linzhi2013/MitoZ.
Butterflies and moths (Lepidoptera) are one of the major superradiations of insects, comprising nearly 160,000 described extant species. As herbivores, pollinators, and prey, Lepidoptera play a ...fundamental role in almost every terrestrial ecosystem. Lepidoptera are also indicators of environmental change and serve as models for research on mimicry and genetics. They have been central to the development of coevolutionary hypotheses, such as butterflies with flowering plants and moths’ evolutionary arms race with echolocating bats. However, these hypotheses have not been rigorously tested, because a robust lepidopteran phylogeny and timing of evolutionary novelties are lacking. To address these issues, we inferred a comprehensive phylogeny of Lepidoptera, using the largest dataset assembled for the order (2,098 orthologous protein-coding genes from transcriptomes of 186 species, representing nearly all superfamilies), and dated it with carefully evaluated synapomorphy-based fossils. The oldest members of the Lepidoptera crown group appeared in the Late Carboniferous (∼300 Ma) and fed on nonvascular land plants. Lepidoptera evolved the tube-like proboscis in the Middle Triassic (∼241 Ma), which allowed them to acquire nectar from flowering plants. This morphological innovation, along with other traits, likely promoted the extraordinary diversification of superfamily-level lepidopteran crown groups. The ancestor of butterflies was likely nocturnal, and our results indicate that butterflies became day-flying in the Late Cretaceous (∼98 Ma). Moth hearing organs arose multiple times before the evolutionary arms race between moths and bats, perhaps initially detecting a wide range of sound frequencies before being co-opted to specifically detect bat sonar. Our study provides an essential framework for future comparative studies on butterfly and moth evolution.
The evolution and genomic basis of beetle diversity McKenna, Duane D.; Shin, Seunggwan; Ahrens, Dirk ...
Proceedings of the National Academy of Sciences - PNAS,
12/2019, Letnik:
116, Številka:
49
Journal Article
Recenzirano
Odprti dostop
The order Coleoptera (beetles) is arguably the most speciose group of animals, but the evolutionary history of beetles, including the impacts of plant feeding (herbivory) on beetle diversification, ...remain poorly understood. We inferred the phylogeny of beetles using 4,818 genes for 146 species, estimated timing and rates of beetle diversification using 89 genes for 521 species representing all major lineages and traced the evolution of beetle genes enabling symbiont-independent digestion of lignocellulose using 154 genomes or transcriptomes. Phylogenomic analyses of these uniquely comprehensive datasets resolved previously controversial beetle relationships, dated the origin of Coleoptera to the Carboniferous, and supported the codiversification of beetles and angiosperms. Moreover, plant cell wall-degrading enzymes (PCWDEs) obtained from bacteria and fungi via horizontal gene transfers may have been key to the Mesozoic diversification of herbivorous beetles—remarkably, both major independent origins of specialized herbivory in beetles coincide with the first appearances of an arsenal of PCWDEs encoded in their genomes. Furthermore, corresponding (Jurassic) diversification rate increases suggest that these novel genes triggered adaptive radiations that resulted in nearly half of all living beetle species. We propose that PCWDEs enabled efficient digestion of plant tissues, including lignocellulose in cell walls, facilitating the evolution of uniquely specialized plant-feeding habits, such as leaf mining and stem and wood boring. Beetle diversity thus appears to have resulted from multiple factors, including low extinction rates over a long evolutionary history, codiversification with angiosperms, and adaptive radiations of specialized herbivorous beetles following convergent horizontal transfers of microbial genes encoding PCWDEs.
Microchromosomes are prevalent in nonmammalian vertebrates P. D. Waters
,
(2021), but a few of them are missing in bird genome assemblies. Here, we present a new chicken reference genome containing ...all autosomes, a Z and a W chromosome, with all gaps closed except for the W. We identified ten small microchromosomes (termed dot chromosomes) with distinct sequence and epigenetic features, among which six were newly assembled. Those dot chromosomes exhibit extremely high GC content and a high level of DNA methylation and are enriched for housekeeping genes. The pericentromeric heterochromatin of dot chromosomes is disproportionately large and continues to expand with the proliferation of satellite DNA and testis-expressed genes. Our analyses revealed that the 41-bp CNM repeat frequently forms higher-order repeats (HORs) at the centromeres of acrocentric chromosomes. The centromere core regions where the kinetochore attaches often encompass telomeric sequence (TTAGGG)n, and in a one of the dot chromosomes, the centromere core recruits an endogenous retrovirus (ERV). We further demonstrate that the W chromosome shares some common features with dot chromosomes, having large arrays of hypermethylated tandem repeats. Finally, using the complete chicken chromosome models, we reconstructed a fine picture of chordate karyotype evolution, revealing frequent chromosomal fusions before and after vertebrate whole-genome duplications. Our sequence and epigenetic characterization of chicken chromosomes shed insights into the understanding of vertebrate genome evolution and chromosome biology.
Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that ...stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale
, the oldest genomic data recovered so far are from a horse specimen dated to 780-560 thousand years ago
. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.
Acoustic communication is enabled by the evolution of specialised hearing and sound producing organs. In this study, we performed a large-scale macroevolutionary study to understand how both hearing ...and sound production evolved and affected diversification in the insect order Orthoptera, which includes many familiar singing insects, such as crickets, katydids, and grasshoppers. Using phylogenomic data, we firmly establish phylogenetic relationships among the major lineages and divergence time estimates within Orthoptera, as well as the lineage-specific and dynamic patterns of evolution for hearing and sound producing organs. In the suborder Ensifera, we infer that forewing-based stridulation and tibial tympanal ears co-evolved, but in the suborder Caelifera, abdominal tympanal ears first evolved in a non-sexual context, and later co-opted for sexual signalling when sound producing organs evolved. However, we find little evidence that the evolution of hearing and sound producing organs increased diversification rates in those lineages with known acoustic communication.
The spectrum of viruses in insects is important for subjects as diverse as public health, veterinary medicine, food production, and biodiversity conservation. The traditional interest in vector-borne ...diseases of humans and livestock has drawn the attention of virus studies to hematophagous insect species. However, these represent only a tiny fraction of the broad diversity of Hexapoda, the most speciose group of animals. Here, we systematically probed the diversity of negative strand RNA viruses in the largest and most representative collection of insect transcriptomes from samples representing all 34 extant orders of Hexapoda and 3 orders of Entognatha, as well as outgroups, altogether representing 1243 species. Based on profile hidden Markov models we detected 488 viral RNA-directed RNA polymerase (RdRp) sequences with similarity to negative strand RNA viruses. These were identified in members of 324 arthropod species. Selection for length, quality, and uniqueness left 234 sequences for analyses, showing similarity to genomes of viruses classified in Bunyavirales (n = 86), Articulavirales (n = 54), and several orders within Haploviricotina (n = 94). Coding-complete genomes or nearly-complete subgenomic assemblies were obtained in 61 cases. Based on phylogenetic topology and the availability of coding-complete genomes we estimate that at least 20 novel viral genera in seven families need to be defined, only two of them monospecific. Seven additional viral clades emerge when adding sequences from the present study to formerly monospecific lineages, potentially requiring up to seven additional genera. One long sequence may indicate a novel family. For segmented viruses, cophylogenies between genome segments were generally improved by the inclusion of viruses from the present study, suggesting that in silico misassembly of segmented genomes is rare or absent. Contrary to previous assessments, significant virus-host codivergence was identified in major phylogenetic lineages based on two different approaches of codivergence analysis in a hypotheses testing framework. In spite of these additions to the known spectrum of viruses in insects, we caution that basing taxonomic decisions on genome information alone is challenging due to technical uncertainties, such as the inability to prove integrity of complete genome assemblies of segmented viruses.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
A super pan-genomic landscape of rice Shang, Lianguang; Li, Xiaoxia; He, Huiying ...
Cell research,
10/2022, Letnik:
32, Številka:
10
Journal Article
Recenzirano
Odprti dostop
Abstract
Pan-genomes from large natural populations can capture genetic diversity and reveal genomic complexity. Using de novo long-read assembly, we generated a graph-based super pan-genome of rice ...consisting of a 251-accession panel comprising both cultivated and wild species of Asian and African rice. Our pan-genome reveals extensive structural variations (SVs) and gene presence/absence variations. Additionally, our pan-genome enables the accurate identification of nucleotide-binding leucine-rich repeat genes and characterization of their inter- and intraspecific diversity. Moreover, we uncovered grain weight-associated SVs which specify traits by affecting the expression of their nearby genes. We characterized genetic variants associated with submergence tolerance, seed shattering and plant architecture and found independent selection for a common set of genes that drove adaptation and domestication in Asian and African rice. This super pan-genome facilitates pinpointing of lineage-specific haplotypes for trait-associated genes and provides insights into the evolutionary events that have shaped the genomic architecture of various rice species.