Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has ...occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences.
Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon.
This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Abstract
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and ...clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Tunisia is a medium-level epidemic country for hepatitis B virus (HBV). This study characterizes, for the first time, full genome HBV strains from Tunisia. Viral load quantification and phylogenetic ...analyses of full genome or pre-S/S sequences were performed on 196 hepatitis B surface antigen (HBsAg)-positive plasma samples from Tunisian blood donors. The median viral load was 64.65 IU ml(-1) (range<5-7.7x10(8) IU ml(-1)) and 89% of samples had viral loads below 10,000 IU ml(-1). Fifty-nine strains formed a novel subgenotype D7, 41 strains clustered in subgenotype D1, seven strains in subgenotype A2 and one strain in genotype C. The novel subgenotype D7 was defined by maximum Bayesian posterior probability, a genetic divergence from other HBV/D subgenotypes by >4% and a stronger HBV/E signal in the X to core genes than subgenotype D1. In conclusion, HBV/D is dominant in asymptomatic Tunisian HBsAg carriers and a novel subgenotype, D7, was the most common subgenotype found in this population.
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public ...release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
The contribution of
γ-glutamyl transpeptidase (GGT) to
Campylobacter jejuni virulence and colonization of the avian gut has been investigated. The presence of the
ggt gene in
C. jejuni strains ...directly correlated with the expression of GGT activity as measured by cleavage and transfer of the
γ-glutamyl moiety. Inactivation of the monocistronic
ggt gene in
C. jejuni strain 81116 resulted in isogenic mutants with undetectable GGT activity; nevertheless, these mutants grew normally
in vitro. However, the mutants had increased motility, a 5.4-fold higher invasion efficiency into INT407 cells
in vitro and increased resistance to hydrogen peroxide stress. Moreover, the apoptosis-inducing activity of the
ggt mutant was significantly lower than that of the parental strain.
In vivo studies showed that, although GGT activity was not required for initial colonization of 1-day-old chicks, the enzyme was required for persistant colonization of the avian gut.
Research in a variety of countries indicates that healthcare access and health-related quality of life are challenged among people with a variety of rare diseases (RDs). However, there has been ...little systematic research on the experiences of children and adults with RDs in the American healthcare system that identifies commonalities across RDs. This research aimed to: (1) Describe demographics, disease characteristics, diagnostic experiences, access to healthcare, knowledge about RDs, support from healthcare professionals, and patient satisfaction among people with RDs and their caregivers; (2) examine predictors of patient satisfaction among adults with RDs; (3) compare health-related quality of life and stigma to US population norms; 4) examine predictors of anxiety and depression among adults and children with RDs.
This large-scale survey included (n = 1128) adults with RD or parents or caregivers of children with RDs representing 344 different RDs. About one third of participants waited four or more years for a diagnosis and misdiagnosis was common. A subset of participants reported experiencing insurance-related delays or denials for tests, treatments, specialists, or services. Approximately half of participants felt their medical and social support was sufficient, yet less than a third had sufficient dental and psychological support. Patients were generally neither satisfied or dissatisfied with their healthcare providers. Major predictors of satisfaction were lower stigma, lower anxiety, shorter diagnostic odyssey, greater physical function, and less pain interference. Adults and children with RDs had significantly poorer health-related quality of life and stigma in all domains compared to US norms. Predictors of both anxiety and depression were greater stigma/poor peer relationships, fatigue, sleep disturbance, limited ability to participate in social roles, and unstable disease course.
People in the U.S. with RDs have poor health-related quality of life and high stigma. These factors are related to patient satisfaction and healthcare access, including diagnostic delays and misdiagnosis. Advocacy work is needed in order to improve healthcare access and ultimately health-related quality of life for children and adults with RDs.
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have ...continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and ...clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.