Ensembl 2021 Howe, Kevin L; Achuthan, Premanand; Allen, James ...
Nucleic acids research,
01/2021, Letnik:
49, Številka:
D1
Journal Article
Recenzirano
Odprti dostop
Abstract
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, ...regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Abstract
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all ...eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.
Abstract
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate ...subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
A draft human pangenome reference Liao, Wen-Wei; Asri, Mobin; Ebler, Jana ...
Nature (London),
05/2023, Letnik:
617, Številka:
7960
Journal Article
Recenzirano
Odprti dostop
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse ...individuals
. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
In an attempt to reduce the infection rate of the COrona VIrus Disease-19 (Covid-19) countries around the world have echoed the exigency for an economical, accessible, point-of-need diagnostic test ...to identify Covid-19 carriers so that they (individuals who test positive) can be advised to self isolate rather than the entire community. Availability of a quick turn-around time diagnostic test would essentially mean that life, in general, can return to normality-at-large. In this regards, studies concurrent in time with ours have investigated different respiratory sounds, including cough, to recognise potential Covid-19 carriers. However, these studies lack clinical control and rely on Internet users confirming their test results in a web questionnaire (crowdsourcing) thus rendering their analysis inadequate. We seek to evaluate the detection performance of a primary screening tool of Covid-19 solely based on the cough sound from 8,380 clinically validated samples with laboratory molecular-test ( 2,339 Covid-19 positive and 6,041 Covid-19 negative) under quantitative RT-PCR (qRT-PCR) from certified laboratories. All collected samples were clinically labelled, i.e., Covid-19 positive or negative, according to the results in addition to the disease severity based on the qRT-PCR threshold cycle (Ct) and lymphocytes count from the patients. Our proposed generic method is an algorithm based on Empirical Mode Decomposition (EMD) for cough sound detection with subsequent classification based on a tensor of audio sonographs and deep artificial neural network classifier with convolutional layers called 'DeepCough' . Two different versions of DeepCough based on the number of tensor dimensions, i.e., DeepCough2D and DeepCough3D, have been investigated. These methods have been deployed in a multi-platform prototype web-app 'CoughDetect' . Covid-19 recognition results rates achieved a promising AUC (Area Under Curve) of <inline-formula><tex-math notation="LaTeX">98.80\% \pm 0.83\%</tex-math> <mml:math><mml:mrow><mml:mn>98</mml:mn><mml:mo>.</mml:mo><mml:mn>80</mml:mn><mml:mo>%</mml:mo><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>83</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="andreuperez-ieq1-3061402.gif"/> </inline-formula>, sensitivity of <inline-formula><tex-math notation="LaTeX">96.43\% \pm 1.85\%</tex-math> <mml:math><mml:mrow><mml:mn>96</mml:mn><mml:mo>.</mml:mo><mml:mn>43</mml:mn><mml:mo>%</mml:mo><mml:mo>±</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>85</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="andreuperez-ieq2-3061402.gif"/> </inline-formula>, and specificity of <inline-formula><tex-math notation="LaTeX">96.20\% \pm 1.74\%</tex-math> <mml:math><mml:mrow><mml:mn>96</mml:mn><mml:mo>.</mml:mo><mml:mn>20</mml:mn><mml:mo>%</mml:mo><mml:mo>±</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>74</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="andreuperez-ieq3-3061402.gif"/> </inline-formula> and average AUC of <inline-formula><tex-math notation="LaTeX">81.08\% \pm 5.05\%</tex-math> <mml:math><mml:mrow><mml:mn>81</mml:mn><mml:mo>.</mml:mo><mml:mn>08</mml:mn><mml:mo>%</mml:mo><mml:mo>±</mml:mo><mml:mn>5</mml:mn><mml:mo>.</mml:mo><mml:mn>05</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="andreuperez-ieq4-3061402.gif"/> </inline-formula> for the recognition of three severity levels. Our proposed web tool as a point-of-need primary diagnostic test for Covid-19 facilitates the rapid detection of the infection. We believe it has the potential to significantly hamper the Covid-19 pandemic across the world.
Abstract
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and ...clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key ...model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to ...create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we ...released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.