Mitochondrial DNA copy number (mtDNA-CN) is a proxy for mitochondrial function and is associated with aging-related diseases. However, it is unclear how mtDNA-CN measured in blood can reflect ...diseases that primarily manifest in other tissues. Using the Genotype-Tissue Expression Project, we interrogated relationships between mtDNA-CN measured in whole blood and gene expression from whole blood and 47 additional tissues in 419 individuals. mtDNA-CN was significantly associated with expression of 700 genes in whole blood, including nuclear genes required for mtDNA replication. Significant enrichment was observed for splicing and ubiquitin-mediated proteolysis pathways, as well as target genes for the mitochondrial transcription factor NRF1. In nonblood tissues, there were more significantly associated genes than expected in 30 tissues, suggesting that global gene expression in those tissues is correlated with blood-derived mtDNA-CN. Neurodegenerative disease pathways were significantly associated in multiple tissues, and in an independent data set, the UK Biobank, we observed that higher mtDNA-CN was significantly associated with lower rates of both prevalent (OR = 0.89, CI = 0.83; 0.96) and incident neurodegenerative disease (HR = 0.95, 95% CI = 0.91;0.98). The observation that mtDNA-CN measured in blood is associated with gene expression in other tissues suggests that blood-derived mtDNA-CN can reflect metabolic health across multiple tissues. Identification of key pathways including splicing, RNA binding, and catalysis reinforces the importance of mitochondria in maintaining cellular homeostasis. Finally, validation of the role of mtDNA CN in neurodegenerative disease in a large independent cohort study solidifies the link between blood-derived mtDNA-CN, altered gene expression in multiple tissues, and aging-related disease.
A challenge of next generation sequencing is read contamination. We use Genotype-Tissue Expression (GTEx) datasets and technical metadata along with RNA-seq datasets from other studies to understand ...factors that contribute to contamination. Here we report, of 48 analyzed tissues in GTEx, 26 have variant co-expression clusters of four highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicate contamination. Sample contamination is strongly associated with a sample being sequenced on the same day as a tissue that natively expresses those genes. Discrepant SNPs across four contaminating genes validate the contamination. Low-level contamination affects ~40% of samples and leads to numerous eQTL assignments in inappropriate tissues among these 18 genes. This type of contamination occurs widely, impacting bulk and single cell (scRNA-seq) data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses.
Conventional cytogenetic testing offers low-resolution detection of balanced karyotypic abnormalities but cannot provide the precise, gene-level knowledge required to predict outcomes. The use of ...high-resolution whole-genome deep sequencing is currently impractical for the purpose of routine clinical care. We show here that whole-genome "jumping libraries" can offer an immediately applicable, nucleotide-level complement to conventional genetic diagnostics within a time frame that allows for clinical action. We performed large-insert sequencing of DNA extracted from amniotic-fluid cells with a balanced de novo translocation. The amniotic-fluid sample was from a patient in the third trimester of pregnancy who underwent amniocentesis because of severe polyhydramnios after multiple fetal anomalies had been detected on ultrasonography. Using a 13-day sequence and analysis pipeline, we discovered direct disruption of CHD7, a causal locus in the CHARGE syndrome (coloboma of the eye, heart anomaly, atresia of the choanae, retardation, and genital and ear anomalies). Clinical findings at birth were consistent with the CHARGE syndrome, a diagnosis that could not have been reliably inferred from the cytogenetic breakpoint. This case study illustrates the potential power of customized whole-genome jumping libraries when used to augment prenatal karyotyping.
Mitochondria carry their own circular genome and disruption of the mitochondrial genome is associated with various aging-related diseases. Unlike the nuclear genome, mitochondrial DNA (mtDNA) can be ...present at 1000 s to 10,000 s copies in somatic cells and variants may exist in a state of heteroplasmy, where only a fraction of the DNA molecules harbors a particular variant. We quantify mtDNA heteroplasmy in 194,871 participants in the UK Biobank and find that heteroplasmy is associated with a 1.5-fold increased risk of all-cause mortality. Additionally, we functionally characterize mtDNA single nucleotide variants (SNVs) using a constraint-based score, mitochondrial local constraint score sum (MSS) and find it associated with all-cause mortality, and with the prevalence and incidence of cancer and cancer-related mortality, particularly leukemia. These results indicate that mitochondria may have a functional role in certain cancers, and mitochondrial heteroplasmic SNVs may serve as a prognostic marker for cancer, especially for leukemia.
Inter-individual variation in the number of copies of the mitochondrial genome, called mitochondrial DNA copy number (mtDNA-CN), reflects mitochondrial function and has been associated with various ...aging-related diseases. We examined 415,422 exomes of self-reported White ancestry individuals from the UK Biobank and tested the impact of rare variants, at the level of single variants and through aggregate variant-set tests, on mtDNA-CN. A survey across nine variant sets tested enrichment of putatively causal variants and identified 14 genes at experiment-wide significance and three genes at marginal significance. These included associations at known mtDNA depletion syndrome genes (mtDNA helicase TWNK, p = 1.1 × 10−30; mitochondrial transcription factor TFAM, p = 4.3 × 10−15; mtDNA maintenance exonuclease MGME1, p = 2.0 × 10−6) and the V617F dominant gain-of-function mutation in the tyrosine kinase JAK2 (p = 2.7 × 10−17), associated with myeloproliferative disease. Novel genes included the ATP-dependent protease CLPX (p = 8.4 × 10−9), involved in mitochondrial proteome quality, and the mitochondrial adenylate kinase AK2 (p = 4.7 × 10−8), involved in hematopoiesis. The most significant association was a missense variant in SAMHD1 (p = 4.2 × 10−28), found on a rare, 1.2-Mb shared ancestral haplotype on chromosome 20. SAMHD1 encodes a cytoplasmic host restriction factor involved in viral defense response and the mitochondrial nucleotide salvage pathway, and is associated with Aicardi-Goutières syndrome 5, a childhood encephalopathy and chronic inflammatory response disorder. Rare variants were enriched in Mendelian mtDNA depletion syndrome loci, and these variants implicated core processes in mtDNA replication, nucleoid structure formation, and maintenance. These data indicate that strong-effect mutations from the nuclear genome contribute to the genetic architecture of mtDNA-CN.
Mitochondrial DNA copy number (mtDNA-CN) is an important biomarker of aging. We tested rare nuclear genetic variants in 415,422 individuals from the UK Biobank for association with mtDNA-CN. Rare variants reveal fundamental processes related to mitochondrial biology and disease.
Copy-number variants (CNVs) have been the predominant focus of genetic studies of structural variation, and chromosomal microarray (CMA) for genome-wide CNV detection is the recommended first-tier ...genetic diagnostic screen in neurodevelopmental disorders. We compared CNVs observed by CMA to the structural variation detected by whole-genome large-insert sequencing in 259 individuals diagnosed with autism spectrum disorder (ASD) from the Simons Simplex Collection. These analyses revealed a diverse landscape of complex duplications in the human genome. One remarkably common class of complex rearrangement, which we term dupINVdup, involves two closely located duplications (“paired duplications”) that flank the breakpoints of an inversion. This complex variant class is cryptic to CMA, but we observed it in 8.1% of all subjects. We also detected other paired-duplication signatures and duplication-mediated complex rearrangements in 15.8% of all ASD subjects. Breakpoint analysis showed that the predominant mechanism of formation of these complex duplication-associated variants was microhomology-mediated repair. On the basis of the striking prevalence of dupINVdups in this cohort, we explored the landscape of all inversion variation among the 235 highest-quality libraries and found abundant complexity among these variants: only 39.3% of inversions were canonical, or simple, inversions without additional rearrangement. Collectively, these findings indicate that dupINVdups, as well as other complex duplication-associated rearrangements, represent relatively common sources of genomic variation that is cryptic to population-based microarray and low-depth whole-genome sequencing. They also suggest that paired-duplication signatures detected by CMA warrant further scrutiny in genetic diagnostic testing given that they might mark complex rearrangements of potential clinical relevance.
NRXN1 microdeletions occur at a relatively high frequency and confer increased risk for neurodevelopmental and neurobehavioral abnormalities. The mechanism that makes NRXN1 a deletion hotspot is ...unknown. Here, we identified deletions of the NRXN1 region in affected cohorts, confirming a strong association with the autism spectrum and other neurodevelopmental disorders. Interestingly, deletions in both affected and control individuals were clustered in the 5′ portion of NRXN1 and its immediate upstream region. To explore the mechanism of deletion, we mapped and analyzed the breakpoints of 32 deletions. At the deletion breakpoints, frequent microhomology (68.8%, 2–19 bp) suggested predominant mechanisms of DNA replication error and/or microhomology-mediated end-joining. Long terminal repeat (LTR) elements, unique non-B-DNA structures, and MEME-defined sequence motifs were significantly enriched, but Alu and LINE sequences were not. Importantly, small-size inverted repeats (minus self chains, minus sequence motifs, and partial complementary sequences) were significantly overrepresented in the vicinity of NRXN1 region deletion breakpoints, suggesting that, although they are not interrupted by the deletion process, such inverted repeats can predispose a region to genomic instability by mediating single-strand DNA looping via the annealing of partially reverse complementary strands and the promoting of DNA replication fork stalling and DNA replication error. Our observations highlight the potential importance of inverted repeats of variable sizes in generating a rearrangement hotspot in which individual breakpoints are not recurrent. Mechanisms that involve short inverted repeats in initiating deletion may also apply to other deletion hotspots in the human genome.
Despite the clinical significance of balanced chromosomal abnormalities (BCAs), their characterization has largely been restricted to cytogenetic resolution. We explored the landscape of BCAs at ...nucleotide resolution in 273 subjects with a spectrum of congenital anomalies. Whole-genome sequencing revised 93% of karyotypes and demonstrated complexity that was cryptic to karyotyping in 21% of BCAs, highlighting the limitations of conventional cytogenetic approaches. At least 33.9% of BCAs resulted in gene disruption that likely contributed to the developmental phenotype, 5.2% were associated with pathogenic genomic imbalances, and 7.3% disrupted topologically associated domains (TADs) encompassing known syndromic loci. Remarkably, BCA breakpoints in eight subjects altered a single TAD encompassing MEF2C, a known driver of 5q14.3 microdeletion syndrome, resulting in decreased MEF2C expression. We propose that sequence-level resolution dramatically improves prediction of clinical outcomes for balanced rearrangements and provides insight into new pathogenic mechanisms, such as altered regulation due to changes in chromosome topology.
Balanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or ...related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages.
Display omitted
▸ Mechanisms of epigenetic and transcriptional regulation implicated in autism ▸ Balanced chromosomal abnormality breakpoints harbor individual strong-effect genes ▸ Dosage-sensitive loci confer risk to autism from a spectrum of mutational mechanisms ▸ Different alterations in a gene are associated with diverse clinical outcomes
Sequencing of balanced chromosomal abnormalities, combined with convergent genomic studies of gene expression, copy-number variation, and genome-wide association, identifies 22 new loci that contribute to autism and related neurodevelopmental disorders. These data support a polygenic risk model for autism and provide new insight into how different types of mutations of the same genes can lead to variable disease phenotypes that manifest at different stages of life.
Diabetes mellitus is a highly heterogeneous disorder encompassing several distinct forms with different clinical manifestations including a wide spectrum of age at onset. Despite many advances, the ...causal genetic defect remains unknown for many subtypes of the disease, including some of those forms with an apparent Mendelian mode of inheritance. Here we report two loss-of-function mutations (c.1655T>A p.Leu552∗ and c.280G>A p.Asp94Asn) in the gene for the Adaptor Protein, Phosphotyrosine Interaction, PH domain, and leucine zipper containing 1 (APPL1) that were identified by means of whole-exome sequencing in two large families with a high prevalence of diabetes not due to mutations in known genes involved in maturity onset diabetes of the young (MODY). APPL1 binds to AKT2, a key molecule in the insulin signaling pathway, thereby enhancing insulin-induced AKT2 activation and downstream signaling leading to insulin action and secretion. Both mutations cause APPL1 loss of function. The p.Leu552∗ alteration totally abolishes APPL1 protein expression in HepG2 transfected cells and the p.Asp94Asn alteration causes significant reduction in the enhancement of the insulin-stimulated AKT2 and GSK3β phosphorylation that is observed after wild-type APPL1 transfection. These findings—linking APPL1 mutations to familial forms of diabetes—reaffirm the critical role of APPL1 in glucose homeostasis.