Abstract
Domestic ducks are raised for meat, eggs and feather down, and almost all varieties are descended from the Mallard (Anas platyrhynchos). Here, we report chromosome-level high-quality genome ...assemblies for meat and laying duck breeds, and the Mallard. Our new genomic databases contain annotations for thousands of new protein-coding genes and recover a major percentage of the presumed “missing genes” in birds. We obtain the entire genomic sequences for the C-type lectin (CTL) family members that regulate eggshell biomineralization. Our population and comparative genomics analyses provide more than 36 million sequence variants between duck populations. Furthermore, a mutant cell line allows confirmation of the predicted anti-adipogenic function of NR2F2 in the duck, and uncovered mutations specific to Pekin duck that potentially affect adipose deposition. Our study provides insights into avian evolution and the genetics of oviparity, and will be a rich resource for the future genetic improvement of commercial traits in the duck.
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two ...decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Graphical Abstract
Graphical Abstract
The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related northern hemisphere white swans. The relative geographic isolation of the black ...swan may have resulted in a limited immune repertoire and increased susceptibility to infectious diseases, notably infectious diseases from which Australia has been largely shielded. Unlike mallard ducks and the mute swan (Cygnus olor), the black swan is extremely sensitive to highly pathogenic avian influenza. Understanding this susceptibility has been impaired by the absence of any available swan genome and transcriptome information.
Here, we generate the first chromosome-length black and mute swan genomes annotated with transcriptome data, all using long-read based pipelines generated for vertebrate species. We use these genomes and transcriptomes to show that unlike other wild waterfowl, black swans lack an expanded immune gene repertoire, lack a key viral pattern-recognition receptor in endothelial cells and mount a poorly controlled inflammatory response to highly pathogenic avian influenza. We also implicate genetic differences in SLC45A2 gene in the iconic plumage of the black swan.
Together, these data suggest that the immune system of the black swan is such that should any avian viral infection become established in its native habitat, the black swan would be in a significant peril.
The advent of Next Generation Sequencing (NGS) has led to the generation of enormous volumes of short read sequence data, cheaply and in reasonable time scales. Nevertheless, the quality of genome ...assemblies generated using NGS technologies has been greatly affected, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data to scaffold repetitive structures, creating gaps, inversions and rearrangements and resulting in assemblies that are, at best, draft forms. Third generation single-molecule sequencing (SMS) technologies (e.g. Pacific Biosciences Single Molecule Real Time (SMRT) system) address this challenge by generating sequences with increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality.
Here, we evaluate the ability of SMS data (specifically human genome Pacific Biosciences SMRT data) to recover poorly represented repetitive sequences (specifically, GC-rich human minisatellites). To do this we designed a pipeline for the collection, processing and local assembly of single-molecule sequence data to form accurate contiguous local reconstructions. Our results show the recovery of an allele of the non-coding minisatellite MS1 (located on chromosome 1 at 1p33-35) at greater than 97% identity to reference (GRCh38) from the unprocessed sequence data of a haploid complete hydatidiform mole (CHM1) cell line. Furthermore, our assembly revealed an allele of over 500 repeat units; much larger than the reference (GRCh38), but consistent in structure with naturally occurring alleles that are segregating in human populations. This local assembly's reconstruction was validated with the release of the whole genome assemblies GCA_001297185.1 and GCA_000772585.3, where this allele occurs. Additionally, application of this pipeline to coding minisatellites in the PRDM9 and ZNF93 genes enabled recovery of high identity allele structures for these sequence regions whose length was confirmed by PCR from cell line genomic DNA. The internal repeat structure of the PRDM9 allele recovered was consistent with common human-specific alleles.
Code available at https://github.com/ndliberial/smrt_pipeline.
dno2@le.ac.uk.
Abstract
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all ...eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have ...continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.
Abstract
The ability of organisms to adapt to sudden extreme environmental changes produces some of the most drastic examples of rapid phenotypic evolution. The Mexican Tetra, Astyanax mexicanus, is ...abundant in the surface waters of northeastern Mexico, but repeated colonizations of cave environments have resulted in the independent evolution of troglomorphic phenotypes in several populations. Here, we present three chromosome-scale assemblies of this species, for one surface and two cave populations, enabling the first whole-genome comparisons between independently evolved cave populations to evaluate the genetic basis for the evolution of adaptation to the cave environment. Our assemblies represent the highest quality of sequence completeness with predicted protein-coding and noncoding gene metrics far surpassing prior resources and, to our knowledge, all long-read assembled teleost genomes, including zebrafish. Whole-genome synteny alignments show highly conserved gene order among cave forms in contrast to a higher number of chromosomal rearrangements when compared with other phylogenetically close or distant teleost species. By phylogenetically assessing gene orthology across distant branches of amniotes, we discover gene orthogroups unique to A. mexicanus. When compared with a representative surface fish genome, we find a rich amount of structural sequence diversity, defined here as the number and size of insertions and deletions as well as expanding and contracting repeats across cave forms. These new more complete genomic resources ensure higher trait resolution for comparative, functional, developmental, and genetic studies of drastic trait differences within a species.
Abstract
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of ...these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.
One common phenomenon in the administration of pedagogy in a typical Nigerian educational setting is the use of traditional and manual methods in teaching and assessment of students at all levels of ...education leading to inaccuracy in student's assessments, delayed feedback on assessments, time wasting, inefficient use of paper resources, lack of privacy and confidentiality of records. In this paper, the authors developed a robust and simple tool that reneges on the power of technology in the administration and management of a course electronically. The developed solution not only proved robust and easy to use, but successfully improved student's cognition and desire to study, improved timeliness in assessment submissions, reduced delay in feedback, enhanced access to course materials, and improved student and staff use of Information and Communication Technology in learning in the Bowen University Iwo, Nigeria.
With the advent of Next Generation Sequencing (NGS), we have witnessed the generation of enormous volumes of short read sequence data, cheaply and on short time scales. Nevertheless, the quality of ...genome assemblies generated using NGS technologies has been greatly affected by this innovation, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data alone to scaffold repetitive structures, creating gaps, inversions and rearrangements and ultimately resulting in assemblies that are, at best, draft forms (by draft we mean, assembly that is only a preliminary result that will require more work to be done to make it a more complete and accurate representation of the genome). Single molecule long-read sequencing (SMS) technologies on the other hand, address this challenge by generating sequences with greatly increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality. Following this development, we evaluate the ability of SMS data (specifically Pacific Biosciences SMRT data and Oxford Nanopore MinION data from human genomes) to recover poorly represented repetitive sequences (specifically, GCrich human minisatellites), identify novel transposable element insertions and enable the closing of gapped regions. Our results show that by using single molecule sequencing and long read technology, poorly represented repetitive sequences (specifically, minisatellites and L1s) and other missing elements in published human genome assemblies can be characterized by developing custom software, scalable for the analysis of single molecule long-reads (particularly, Pacific Biosciences’ SMRT technology). The tool designed is cross-platform, thus, giving computational and non-computational biologists a straightforward approach and less technical platform for local analysis of specific poorly characterized DNA sequences.