Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Comparing transcript levels between healthy and diseased individuals allows the identification of differentially expressed genes, which may be causes, consequences or mere correlates of the disease ...under scrutiny. We propose a method to decompose the observational correlation between gene expression and phenotypes driven by confounders, forward- and reverse causal effects. The bi-directional causal effects between gene expression and complex traits are obtained by Mendelian Randomization integrating summary-level data from GWAS and whole-blood eQTLs. Applying this approach to complex traits reveals that forward effects have negligible contribution. For example, BMI- and triglycerides-gene expression correlation coefficients robustly correlate with trait-to-expression causal effects (r
= 0.11, P
= 2.0 × 10
and r
= 0.13, P
= 1.1 × 10
), but not detectably with expression-to-trait effects. Our results demonstrate that studies comparing the transcriptome of diseased and healthy subjects are more prone to reveal disease-induced gene expression changes rather than disease causing ones.
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public ...release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Copy number variants (CNVs) are major contributors to genetic disorders. We have dissected a region of the 16p11.2 chromosome--which encompasses 29 genes--that confers susceptibility to ...neurocognitive defects when deleted or duplicated. Overexpression of each human transcript in zebrafish embryos identified KCTD13 as the sole message capable of inducing the microcephaly phenotype associated with the 16p11.2 duplication, whereas suppression of the same locus yielded the macrocephalic phenotype associated with the 16p11.2 deletion, capturing the mirror phenotypes of humans. Analyses of zebrafish and mouse embryos suggest that microcephaly is caused by decreased proliferation of neuronal progenitors with concomitant increase in apoptosis in the developing brain, whereas macrocephaly arises by increased proliferation and no changes in apoptosis. A role for KCTD13 dosage changes is consistent with autism in both a recently reported family with a reduced 16p11.2 deletion and a subject reported here with a complex 16p11.2 rearrangement involving de novo structural alteration of KCTD13. Our data suggest that KCTD13 is a major driver for the neurodevelopmental phenotypes associated with the 16p11.2 CNV, reinforce the idea that one or a small number of transcripts within a CNV can underpin clinical phenotypes, and offer an efficient route to identifying dosage-sensitive loci.
DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the ...interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event.
To assess the contribution of rare variants in the genetic background toward variability of neurodevelopmental phenotypes in individuals with rare copy-number variants (CNVs) and gene-disruptive ...variants.
We analyzed quantitative clinical information, exome sequencing, and microarray data from 757 probands and 233 parents and siblings who carry disease-associated variants.
The number of rare likely deleterious variants in functionally intolerant genes ("other hits") correlated with expression of neurodevelopmental phenotypes in probands with 16p12.1 deletion (n=23, p=0.004) and in autism probands carrying gene-disruptive variants (n=184, p=0.03) compared with their carrier family members. Probands with 16p12.1 deletion and a strong family history presented more severe clinical features (p=0.04) and higher burden of other hits compared with those with mild/no family history (p=0.001). The number of other hits also correlated with severity of cognitive impairment in probands carrying pathogenic CNVs (n=53) or de novo pathogenic variants in disease genes (n=290), and negatively correlated with head size among 80 probands with 16p11.2 deletion. These co-occurring hits involved known disease-associated genes such as SETD5, AUTS2, and NRXN1, and were enriched for cellular and developmental processes.
Accurate genetic diagnosis of complex disorders will require complete evaluation of the genetic background even after a candidate disease-associated variant is identified.
It is currently unclear whether tissue changes surrounding multifocal epithelial tumors are a cause or consequence of cancer. Here, we provide evidence that loss of mesenchymal Notch/CSL signaling ...causes tissue alterations, including stromal atrophy and inflammation, which precede and are potent triggers for epithelial tumors. Mice carrying a mesenchymal-specific deletion of CSL/RBP-Jκ, a key Notch effector, exhibit spontaneous multifocal keratinocyte tumors that develop after dermal atrophy and inflammation. CSL-deficient dermal fibroblasts promote increased tumor cell proliferation through upregulation of c-Jun and c-Fos expression and consequently higher levels of diffusible growth factors, inflammatory cytokines, and matrix-remodeling enzymes. In human skin samples, stromal fields adjacent to multifocal premalignant actinic keratosis lesions exhibit decreased Notch/CSL signaling and associated molecular changes. Importantly, these changes in gene expression are also induced by UVA, a known environmental cause of cutaneous field cancerization and skin cancer.
Display omitted
► Mesenchymal loss of CSL/Notch results in field cancerization of the skin epithelium ► Protumorigenic consequences of CSL loss are linked to c-Jun/c-Fos upregulation ► Anti-inflammatory treatment counteracts the field cancerization phenotype ► UVA exposure alters DNA methylation to downregulate stromal Notch signaling
Mesenchymal loss of a Notch effector or downregulation of Notch signaling by UVA triggers oncogenesis in the overlying epidermis. Inflammation of the stroma precedes the spread of epithelial lesions across a patch of skin, and importantly, inhibiting this inflammatory response counteracts the spread of multifocal skin tumors.
Chromatin state variation at gene regulatory elements is abundant across individuals, yet we understand little about the genetic basis of this variability. Here, we profiled several histone ...modifications, the transcription factor (TF) PU.1, RNA polymerase II, and gene expression in lymphoblastoid cell lines from 47 whole-genome sequenced individuals. We observed that distinct cis-regulatory elements exhibit coordinated chromatin variation across individuals in the form of variable chromatin modules (VCMs) at sub-Mb scale. VCMs were associated with thousands of genes and preferentially cluster within chromosomal contact domains. We mapped strong proximal and weak, yet more ubiquitous, distal-acting chromatin quantitative trait loci (cQTL) that frequently explain this variation. cQTLs were associated with molecular activity at clusters of cis-regulatory elements and mapped preferentially within TF-bound regions. We propose that local, sequence-independent chromatin variation emerges as a result of genetic perturbations in cooperative interactions between cis-regulatory elements that are located within the same genomic domain.
Display omitted
•Modules of correlated molecular phenotypes represent inter-individual chromatin variation•Variable chromatin modules (VCMs) are embedded within chromosomal contact domains•VCMs are orchestrated by cis-acting genetic variation•VCMs rationalize chromatin state changes that are independent of local DNA sequence
Spatially defined chromosome regions, termed variable chromatin modules, exhibit coordinated chromatin state changes across cis-regulatory elements. Within these modules, genetic changes distal from the regulatory element itself can induce variation in chromatin patterns between individuals.
The 16p11.2 600 kb BP4-BP5 deletion and duplication syndromes have been associated with developmental delay; autism spectrum disorders; and reciprocal effects on the body mass index, head ...circumference and brain volumes. Here, we explored these relationships using novel engineered mouse models carrying a deletion (Del/+) or a duplication (Dup/+) of the Sult1a1-Spn region homologous to the human 16p11.2 BP4-BP5 locus. On a C57BL/6N inbred genetic background, Del/+ mice exhibited reduced weight and impaired adipogenesis, hyperactivity, repetitive behaviors, and recognition memory deficits. In contrast, Dup/+ mice showed largely opposite phenotypes. On a F1 C57BL/6N × C3B hybrid genetic background, we also observed alterations in social interaction in the Del/+ and the Dup/+ animals, with other robust phenotypes affecting recognition memory and weight. To explore the dosage effect of the 16p11.2 genes on metabolism, Del/+ and Dup/+ models were challenged with high fat and high sugar diet, which revealed opposite energy imbalance. Transcriptomic analysis revealed that the majority of the genes located in the Sult1a1-Spn region were sensitive to dosage with a major effect on several pathways associated with neurocognitive and metabolic phenotypes. Whereas the behavioral consequence of the 16p11 region genetic dosage was similar in mice and humans with activity and memory alterations, the metabolic defects were opposite: adult Del/+ mice are lean in comparison to the human obese phenotype and the Dup/+ mice are overweight in comparison to the human underweight phenotype. Together, these data indicate that the dosage imbalance at the 16p11.2 locus perturbs the expression of modifiers outside the CNV that can modulate the penetrance, expressivity and direction of effects in both humans and mice.
Ascertaining when and where genes are expressed is of crucial importance to understanding or predicting the physiological role of genes and proteins and how they interact to form the complex networks ...that underlie organ development and function. It is, therefore, crucial to determine on a genome-wide level, the spatio-temporal gene expression profiles at cellular resolution. This information is provided by colorimetric RNA in situ hybridization that can elucidate expression of genes in their native context and does so at cellular resolution. We generated what is to our knowledge the first genome-wide transcriptome atlas by RNA in situ hybridization of an entire mammalian organism, the developing mouse at embryonic day 14.5. This digital transcriptome atlas, the Eurexpress atlas (http://www.eurexpress.org), consists of a searchable database of annotated images that can be interactively viewed. We generated anatomy-based expression profiles for over 18,000 coding genes and over 400 microRNAs. We identified 1,002 tissue-specific genes that are a source of novel tissue-specific markers for 37 different anatomical structures. The quality and the resolution of the data revealed novel molecular domains for several developing structures, such as the telencephalon, a novel organization for the hypothalamus, and insight on the Wnt network involved in renal epithelial differentiation during kidney development. The digital transcriptome atlas is a powerful resource to determine co-expression of genes, to identify cell populations and lineages, and to identify functional associations between genes relevant to development and disease.