Abstract
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and ...clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole‐genome alignment, ...promoter analysis, or pangenome exploration. Although homology‐based annotation methods are computationally expensive, k‐mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two‐step approach, where repeats were first called by k‐mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k‐mer‐based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red‐masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant‐scripts.
Core Ideas
Control Pfam domains minimize unrelated coding sequences in repeat libraries.
Repeat calling by k‐mer counting with Red does not preferentially mask NLR genes.
Repeats called by Red can be efficiently classified by sequence similarity with minimap2.
Studies suggest a relationship between hypertension and outcome in bevacizumab-treated patients with metastatic colorectal cancer (mCRC). We performed a retrospective analysis of two phase II studies ...(BECA and BECOX) to determine if hypertension and proteinuria predict outcome in elderly patients with mCRC treated with bevacizumab.
Patients ≥ 70 years of age received either capecitabine 1250 mg/m(2) bid days 1-14 + bevacizumab 7.5 mg/kg day 1 every 21 days (BECA study) or capecitabine 1000 mg/m(2) bid days 1-14 with bevacizumab 7.5 mg/kg and oxaliplatin 130 mg/m(2) day 1 (BECOX study). The primary objective was to correlate hypertension and proteinuria with overall response rate (ORR), time to progression (TTP) and overall survival (OS). Secondary objectives included identification of risk factors associated with the development of hypertension and proteinuria and determining whether development of hypertension or proteinuria in the first 2 cycles was related to ORR, disease-control rate (DCR), TTP or OS.
In total, 127 patients (median age 75.5 years) were included in the study. Hypertension correlated with DCR and OS; proteinuria correlated with ORR and DCR. Proteinuria or hypertension in the first 2 cycles did not correlate with efficacy. Risk factors for hypertension were female gender (odds ratio OR 0.241; P = 0.011) and more bevacizumab cycles (OR 1.112; P = 0.002); risk factors for proteinuria were diabetes (OR 3.869; P = 0.006) and more bevacizumab cycles (OR 1.181; P<0.0001). Multivariate analysis identified as having prognostic value: baseline lactate dehydrogenase, haemoglobin, number of metastatic lesions and DCR.
This analysis of two phase II studies suggests that hypertension is significantly correlated with OS but not with ORR and TTP, whereas proteinuria is correlated with ORR but not with OS and TTP. Both hypertension and proteinuria are associated with the duration of bevacizumab treatment and do not represent an independent prognostic factor.
Sepsis due to phlegmonous gastritis in a cancer patient Gutiérrez Pérez, César; Chivato Martín-Falquina, Irene; Rodríguez Ledesma, Inmaculada ...
Revista española de enfermedades digestivas,
03/2023, Volume:
115, Issue:
3
Journal Article
Peer reviewed
Open access
We bring forward a case of a 58-year-old female who, undergoing treatment for glioblastoma with temozolomide and radiotherapy, visited the Emergency Department due to acute abdominal pain and ...chemotherapy-induced febrile neutropenia. She was diagnosed with sepsis due to phlegmonous gastritis. After several weeks in the Intensive Care Unit with antimicrobial coverage, our patient was discharged. Conceptually, phlegmonous gastritis is a highly unusual bacterial infection of the gastric wall. Intrinsically related to the alteration of the immune system, and frequently linked to cancer patients, its high morbidity and mortality and exceptional casuistry require early treatment and clinical suspicion.
Ensembl 2021 Howe, Kevin L; Achuthan, Premanand; Allen, James ...
Nucleic acids research,
01/2021, Volume:
49, Issue:
D1
Journal Article
Peer reviewed
Open access
Abstract
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, ...regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Tumor molecular profiling upon disease progression enables investigations of the tumor evolution. Next-generation sequencing (NGS) of liquid biopsies constitutes a noninvasive readily available ...source of tumor molecular information. In this study, 124 plasma samples from advanced EGFR-positive NSCLC patients, treated with a first-line EGFR tyrosine kinase inhibitor (EGFR-TKI) were collected upon disease progression. The circulating cell-free DNA (cfDNA) was sequenced using the Oncomine Pan-Cancer Cell-Free Assay™. Excluding EGFR mutations, the most frequently mutated gene was TP53 (57.3%), followed by APC (11.3%), FGFR3 (7.3%), and KRAS (5.6%). Different molecular alterations were observed upon disease progression depending on the location of the original EGFR-sensitizing mutation. Specifically, the detection of the p.T790M mutation was significantly associated with the presence of exon 19 mutations in EGFR (Fisher p-value: 0.028). All KRAS activating mutations (n = 8) were detected in tumors with EGFR mutations in exons 18 and 21 (Fisher p-value < 0.001). Similarly, mutations in NRAS and HRAS were more frequently detected in samples from tumors harboring mutations in exons 18 or 21 (Fisher p-value: 0.050 and Fisher p-value: 0.099, respectively). In conclusion, our data suggest that the mechanisms underlying EGFR-TKI resistance could be dependent on the exon location of the original EGFR-sensitizing mutation.
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key ...model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to ...create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we ...released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.