Visualization is frequently used to aid our interpretation of complex datasets. Within microbial genomics, visualizing the relationships between multiple genomes as a tree provides a framework onto ...which associated data (geographical, temporal, phenotypic and epidemiological) are added to generate hypotheses and to explore the dynamics of the system under investigation. Selected static images are then used within publications to highlight the key findings to a wider audience. However, these images are a very inadequate way of exploring and interpreting the richness of the data. There is, therefore, a need for flexible, interactive software that presents the population genomic outputs and associated data in a user-friendly manner for a wide range of end users, from trained bioinformaticians to front-line epidemiologists and health workers. Here, we present Microreact, a web application for the easy visualization of datasets consisting of any combination of trees, geographical, temporal and associated metadata. Data files can be uploaded to Microreact directly via the web browser or by linking to their location (e.g. from Google Drive/Dropbox or via API), and an integrated visualization via trees, maps, timelines and tables provides interactive querying of the data. The visualization can be shared as a permanent web link among collaborators, or embedded within publications to enable readers to explore and download the data. Microreact can act as an end point for any tool or bioinformatic pipeline that ultimately generates a tree, and provides a simple, yet powerful, visualization method that will aid research and discovery and the open sharing of datasets.
Invasive pneumococcal disease remains an important health priority owing to increasing disease incidence caused by pneumococci expressing non-vaccine serotypes. We previously defined 621 Global ...Pneumococcal Sequence Clusters (GPSCs) by analysing 20 027 pneumococcal isolates collected worldwide and from previously published genomic data. In this study, we aimed to investigate the pneumococcal lineages behind the predominant serotypes, the mechanism of serotype replacement in disease, as well as the major pneumococcal lineages contributing to invasive pneumococcal disease in the post-vaccine era and their antibiotic resistant traits.
We whole-genome sequenced 3233 invasive pneumococcal disease isolates from laboratory-based surveillance programmes in Hong Kong (n=78), Israel (n=701), Malawi (n=226), South Africa (n=1351), The Gambia (n=203), and the USA (n=674). The genomes represented pneumococci from before and after pneumococcal conjugate vaccine (PCV) introductions and were from children younger than 3 years. We identified predominant serotypes by prevalence and their major contributing lineages in each country, and assessed any serotype replacement by comparing the incidence rate between the pre-PCV and PCV periods for Israel, South Africa, and the USA. We defined the status of a lineage as vaccine-type GPSC (≥50% 13-valent PCV PCV13 serotypes) or non-vaccine-type GPSC (>50% non-PCV13 serotypes) on the basis of its initial serotype composition detected in the earliest vaccine period to measure their individual contribution toward serotype replacement in each country. Major pneumococcal lineages in the PCV period were identified by pooled incidence rate using a random effects model.
The five most prevalent serotypes in the PCV13 period varied between countries, with only serotypes 5, 12F, 15B/C, 19A, 33F, and 35B/D common to two or more countries. The five most prevalent serotypes in the PCV13 period varied between countries, with only serotypes 5, 12F, 15B/C, 19A, 33F, and 35B/D common to two or more countries. These serotypes were associated with more than one lineage, except for serotype 5 (GPSC8). Serotype replacement was mainly mediated by expansion of non-vaccine serotypes within vaccine-type GPSCs and, to a lesser extent, by increases in non-vaccine-type GPSCs. A globally spreading lineage, GPSC3, expressing invasive serotypes 8 in South Africa and 33F in the USA and Israel, was the most common lineage causing non-vaccine serotype invasive pneumococcal disease in the PCV13 period. We observed that same prevalent non-vaccine serotypes could be associated with distinctive lineages in different countries, which exhibited dissimilar antibiotic resistance profiles. In non-vaccine serotype isolates, we detected significant increases in the prevalence of resistance to penicillin (52 21% of 249 vs 169 29% of 575, p=0·0016) and erythromycin (three 1% of 249 vs 65 11% of 575, p=0·0031) in the PCV13 period compared with the pre-PCV period.
Globally spreading lineages expressing invasive serotypes have an important role in serotype replacement, and emerging non-vaccine serotypes associated with different pneumococcal lineages in different countries might be explained by local antibiotic-selective pressures. Continued genomic surveillance of the dynamics of the pneumococcal population with increased geographical representation in the post-vaccine period will generate further knowledge for optimising future vaccine design.
Bill & Melinda Gates Foundation, Wellcome Sanger Institute, and the US Centers for Disease Control.
Abstract
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all ...eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.
Ensembl 2021 Howe, Kevin L; Achuthan, Premanand; Allen, James ...
Nucleic acids research,
01/2021, Letnik:
49, Številka:
D1
Journal Article
Recenzirano
Odprti dostop
Abstract
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, ...regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Abstract
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate ...subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Abstract
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating ...genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Abstract
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource ...driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have ...continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to ...create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Abstract
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project ...(https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.