Genetic variants affecting pancreatic islet enhancers are central to T2D risk, but the gene targets of islet enhancer activity are largely unknown. We generate a high-resolution map of islet ...chromatin loops using Hi-C assays in three islet samples and use loops to annotate target genes of islet enhancers defined using ATAC-seq and published ChIP-seq data. We identify candidate target genes for thousands of islet enhancers, and find that enhancer looping is correlated with islet-specific gene expression. We fine-map T2D risk variants affecting islet enhancers, and find that candidate target genes of these variants defined using chromatin looping and eQTL mapping are enriched in protein transport and secretion pathways. At IGF2BP2, a fine-mapped T2D variant reduces islet enhancer activity and IGF2BP2 expression, and conditional inactivation of IGF2BP2 in mouse islets impairs glucose-stimulated insulin secretion. Our findings provide a resource for studying islet enhancer function and identifying genes involved in T2D risk.
The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome ...sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%<MAF⩽5%) of the Japonica array reached 67.2%, which is higher than those of the existing arrays. In addition, we confirmed the high quality genotyping performance of the Japonica array using the 288 samples in 1KJPN; the average call rate 99.7% and the average concordance rate 99.7% to the genotypes obtained from high-throughput sequencer. As demonstrated in this study, the creation of custom-made SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations.
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, SIK, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
In this study, we used whole-genome sequencing and gene expression profiling of 215 human induced pluripotent stem cell (iPSC) lines from different donors to identify genetic variants associated with ...RNA expression for 5,746 genes. We were able to predict causal variants for these expression quantitative trait loci (eQTLs) that disrupt transcription factor binding and validated a subset of them experimentally. We also identified copy-number variant (CNV) eQTLs, including some that appear to affect gene expression by altering the copy number of intergenic regulatory regions. In addition, we were able to identify effects on gene expression of rare genic CNVs and regulatory single-nucleotide variants and found that reactivation of gene expression on the X chromosome depends on gene chromosomal position. Our work highlights the value of iPSCs for genetic association analyses and provides a unique resource for investigating the genetic regulation of gene expression in pluripotent cells.
Display omitted
•Profiling of 215 hiPSC lines enables eQTL mapping of gene expression variation•iPSC eQTLs are enriched in stem cell gene regulatory regions and affect TF binding•Copy-number eQTLs in intergenic regulatory regions also affect expression•Whole-genome sequencing highlights the influence of rare and copy-number variants
Working as part of the NextGen consortium, DeBoever et al. use whole-genome and RNA sequencing to map expression quantitative trait loci in a set of 215 human induced pluripotent stem cell lines. These genotype-expression associations provide a foundation for understanding the genetic regulation of gene expression in pluripotent cells.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We ...create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of
expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes.
Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA ...loci, HLA typing at high resolution is challenging even with whole-genome sequencing data.
We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples.
HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Schizophrenia presents clinical and biological differences between males and females. This study investigated transcriptional profiles in the dorsolateral prefrontal cortex (DLPFC) using postmortem ...data from the largest RNA-sequencing (RNA-seq) database on schizophrenic cases and controls. Data for 154 male and 113 female controls and 160 male and 93 female schizophrenic cases were obtained from the CommonMind Consortium. In the RNA-seq database, the principal component analysis showed that sex effects were small in schizophrenia. After we analyzed the impact of sex-specific differences on gene expression, the female group showed more significantly changed genes compared with the male group. Based on the gene ontology analysis, the female sex-specific genes that changed were overrepresented in the mitochondrion, ATP (phosphocreatine and adenosine triphosphate)-, and metal ion-binding relevant biological processes. An ingenuity pathway analysis revealed that the differentially expressed genes related to schizophrenia in the female group were involved in midbrain dopaminergic and γ-aminobutyric acid (GABA)-ergic neurons and microglia. We used methylated DNA-binding domain-sequencing analyses and microarray to investigate the DNA methylation that potentially impacts the sex differences in gene transcription using a maternal immune activation (MIA) murine model. Among the sex-specific positional genes related to schizophrenia in the PFC of female offspring from MIA, the changes in the methylation and transcriptional expression of loci
ACSBG1
were validated in the females with schizophrenia in independent postmortem samples by real-time PCR and pyrosequencing. Our results reveal potential genetic risks in the DLPFC for the sex-dependent prevalence and symptomology of schizophrenia.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, ...intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed.
This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses.
Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE ...resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines.
Display omitted
•iPSCORE: A collection of publicly available iPSCs from 222 individuals•Several multigenerational families and individuals of various ethnicities and ages•Individuals carrying risk and non-risk genotypes for 95% of GWAS SNPs•Genetic variants associated with mRNA expression in differentiated cardiomyocytes
Working as part of the NHLBI NextGen consortium, Panopoulos and colleagues report the derivation and characterization of 222 publicly available iPSCs from ethnically diverse individuals with corresponding genomic data including SNP arrays, RNA-seq, and whole-genome sequencing. This collection provides a powerful resource to investigate the function of genetic variants.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not ...have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.
Neural stem cells (NSCs) are considered to be the cell of origin of glioblastoma multiforme (GBM). However, the genetic alterations that transform NSCs into glioma-initiating cells remain elusive. ...Using a unique transposon mutagenesis strategy that mutagenizes NSCs in culture, followed by additional rounds of mutagenesis to generate tumors in vivo, we have identified genes and signaling pathways that can transform NSCs into glioma-initiating cells. Mobilization of Sleeping Beauty transposons in NSCs induced the immortalization of astroglial-like cells, which were then able to generate tumors with characteristics of the mesenchymal subtype of GBM on transplantation, consistent with a potential astroglial origin for mesenchymal GBM. Sequence analysis of transposon insertion sites from tumors and immortalized cells identified more than 200 frequently mutated genes, including human GBM-associated genes, such as Met and Nf1 , and made it possible to discriminate between genes that function during astroglial immortalization vs. later stages of tumor development. We also functionally validated five GBM candidate genes using a previously undescribed high-throughput method. Finally, we show that even clonally related tumors derived from the same immortalized line have acquired distinct combinations of genetic alterations during tumor development, suggesting that tumor formation in this model system involves competition among genetically variant cells, which is similar to the Darwinian evolutionary processes now thought to generate many human cancers. This mutagenesis strategy is faster and simpler than conventional transposon screens and can potentially be applied to any tissue stem/progenitor cells that can be grown and differentiated in vitro.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK