To understand the mutational burden of human induced pluripotent stem cells (iPSCs), we sequenced genomes of 18 fibroblast-derived iPSC lines and identified different classes of somatic mutations ...based on structure, origin, and frequency. Copy-number alterations affected 295 kb in each sample and strongly impacted gene expression. UV-damage mutations were present in ∼45% of the iPSCs and accounted for most of the observed heterogeneity in mutation rates across lines. Subclonal mutations (not present in all iPSCs within a line) composed 10% of point mutations and, compared with clonal variants, showed an enrichment in active promoters and increased association with altered gene expression. Our study shows that, by combining WGS, transcriptome, and epigenome data, we can understand the mutational burden of each iPSC line on an individual basis and suggests that this information could be used to prioritize iPSC lines for models of specific human diseases and/or transplantation therapy.
Display omitted
•Mutations due to UV-damage are present in ∼50% of iPSCs derived from skin fibroblasts•Clonal and subclonal UV-damage mutations are associated with different chromatin states•Subclonal mutations are enriched in active promotors and tend to alter gene expression•Subclonal mutations tend not to evolve during passaging and differentiation
To understand the mutational burden of iPSCs, D’Antonio et al. sequenced genomes from 18 lines and identified four somatic mutation classes: clonal, subclonal, UV-damage mutations, and CNAs. Annotating mutations based on their class and the chromatin state in which they occur enables prediction of their influence on gene expression.
Over the last decade, a substantial amount of work in genetics has been done with the goal of understanding how genetic variation affects human traits and diseases, primarily via genome-wide ...association studies (GWAS). Until recently, these studies have focused on associations with single nucleotide variants (SNVs), largely because they have traditionally been easier to genotype. However, the genome contains diverse classes of non-SNV variation such as short tandem repeats (STRs) and structural variants (SVs) that have been shown in some cases to affect human traits. The increasing availability of deep whole genome sequencing (WGS) data, has now enabled algorithms to robustly detect high resolution structural variants and STRs, and the potential for deeper understanding of these variants. Here I present two studies that focus on characterizing the extent and functional impact of SVs and STRs in the human genome. First, I present a study in which I built a comprehensive high quality map of SVs and STRs using over 700 deeply sequenced genomes. I also describe a novel method of filtering variants using reproducibility of genotypes within genetically duplicate sample pairs, and use this information to make insights into the quality of diverse classes of variants called using different methods. I then utilize this high quality map of genetic variation to assess the impact of different classes of variation on gene expression, and show that the functional properties of unique classes of genetic variation is associated with their likelihood to affect genes and linkage to complex traits in humans.
The impact of genetic regulatory variation active in early pancreatic development on adult pancreatic disease and traits is not well understood. Here, we generate a panel of 107 fetal-like ...iPSC-derived pancreatic progenitor cells (iPSC-PPCs) from whole genome-sequenced individuals and identify 4065 genes and 4016 isoforms whose expression and/or alternative splicing are affected by regulatory variation. We integrate eQTLs identified in adult islets and whole pancreas samples, which reveal 1805 eQTL associations that are unique to the fetal-like iPSC-PPCs and 1043 eQTLs that exhibit regulatory plasticity across the fetal-like and adult pancreas tissues. Colocalization with GWAS risk loci for pancreatic diseases and traits show that some putative causal regulatory variants are active only in the fetal-like iPSC-PPCs and likely influence disease by modulating expression of disease-associated genes in early development, while others with regulatory plasticity likely exert their effects in both the fetal and adult pancreas by modulating expression of different disease genes in the two developmental stages.
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states ...vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.
Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE ...resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines.
The causal variants and genes underlying thousands of cardiac GWAS signals have yet to be identified. Here, we leverage spatiotemporal information on 966 RNA-seq cardiac samples and perform an ...expression quantitative trait locus (eQTL) analysis detecting eQTLs considering both eGenes and eIsoforms. We identify 2,578 eQTLs associated with a specific developmental stage-, tissue- and/or cell type. Colocalization between eQTL and GWAS signals of five cardiac traits identified variants with high posterior probabilities for being causal in 210 GWAS loci. Pulse pressure GWAS loci are enriched for colocalization with fetal- and smooth muscle- eQTLs; pulse rate with adult- and cardiac muscle- eQTLs; and atrial fibrillation with cardiac muscle- eQTLs. Fine mapping identifies 79 credible sets with five or fewer SNPs, of which 15 were associated with spatiotemporal eQTLs. Our study shows that many cardiac GWAS variants impact traits and disease in a developmental stage-, tissue- and/or cell type-specific fashion.
While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40-50% of families have the genetic basis for their disease unknown. In this study we sought to identify the ...underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 (AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.
Abstract
Gastric cancer is the second leading cause of cancer deaths in the world. The genomics of gastric cancers is unique in that they harbor significantly more copy number alterations compared to ...point mutations, yet the functional importance of these genetic alterations in tumor maintenance is not known. To better understand oncogenic drivers of gastric cancer and identify potential therapeutic targets we performed negative selection RNAi screens in ten well annotated gastric cancer cell lines. Screens were performed using two different but overlapping shRNA libraries. The first library was the Decipher Human Module I pool from Cellecta composed of 27500 shRNAs targeting 5043 genes. The second library was a custom designed focused pool with 6500 shRNAs targeting 608 genes. In addition to screening the two shRNA libraries in vitro, the focused pool was also screened in subcutaneous xenograft tumor models in eight of the gastric cancer cell lines. The screens revealed distinct genetic vulnerabilities that correlated with the corresponding genomic alteration in the specific cell lines. In particular we found that KRAS amplifications confer dependency to the same degree as activating KRAS mutations. This KRAS dependency was further validated with additional shRNAs in KRAS amplified and mutated cell lines. Furthermore, we identified AMPK which is focally amplified in 9% of gastric cancer as a critical oncogenic driver. Multiple subunits of the AMPK holoenzyme scored in the screen and dependency on AMPK alpha and beta subunits was demonstrated with independent shRNAs in two cell lines from the primary screen. Consistent with the screen results we find that LMSU, a gastric cancer cell line not part of the primary screen but annotated as amplified for the AMPK alpha subunit shows elevated expression levels and is sensitive to knockdown of AMPK. These observations have identified AMPK as a novel oncogenic driver in gastric cancer with therapeutic potential.
Citation Format: Meghana M. Kulkarni, Sushma Gurumurthy, Oleg Schmidt-Kittler, Jason Berglund, Christopher H. Hulton, David J. Wilson, David Jakubosky, Daniel Michaud, Robert E. Jones, Nicole M. Sjoblom, Russell McSweeney, Hongwei Zhou, Annapurna Venkatakrishnan, Karin J. Jensen, Jingxin Zhang, Parminder K. Mankoo, Jack Pollard, Christopher Winter, Pasi A. Jänne, Kwok-Kin Wong, Victoria M. Richon, Jessie M. English, Mark A. Bittinger. Functional genomics reveals genetic dependencies in gastric cancer. abstract. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 1110. doi:10.1158/1538-7445.AM2015-1110
To understand the mutational burden of human induced pluripotent stem cells (iPSCs), we whole genome sequenced 18 fibroblast-derived iPSC lines and identified different classes of somatic mutations ...based on structure, origin and frequency. Copy number alterations affected 295 kb in each sample and strongly impacted gene expression. UV-damage mutations were present in ~45% of the iPSCs and accounted for most of the observed heterogeneity in mutation rates across lines. Subclonal mutations (not present in all iPSCs within a line) composed 10% of point mutations, and compared with clonal variants, showed an enrichment in active promoters and increased association with altered gene expression. Our study shows that, by combining WGS, transcriptome and epigenome data, we can understand the mutational burden of each iPSC line on an individual basis and suggests that this information could be used to prioritize iPSC lines for models of specific human diseases and/or transplantation therapy.