Genome-wide association studies (GWASs) have identified thousands of loci associated with hundreds of complex diseases and traits, and progress is being made toward elucidating the causal variants ...and genes underlying these associations. Functional characterization of mechanisms at GWAS loci is a multi-faceted challenge. Challenges include linkage disequilibrium and allelic heterogeneity at each locus, the noncoding nature of most loci, and the time and cost needed for experimentally evaluating the potential mechanistic contributions of genes and variants. As GWAS sample sizes increase, more loci are identified, and the complexities of individual loci emerge. Loci can consist of multiple association signals, each of which can reflect the influence of multiple variants, inseparable by association analyses. Each signal within a locus can influence the same or different target genes. Experimental studies of genes and variants can differ on the basis of cell type, cellular environment, or other context-specific variables. In this review, we describe the complexity of mechanisms at GWAS loci—including multiple signals, multiple variants, and/or multiple genes—and the implications these complexities hold for experimental study design and interpretation of GWAS mechanisms.
Genome-wide association (GWAS) and sequencing studies are providing new insights into the genetic basis of type 2 diabetes (T2D) and the inter-individual variation in glycemic traits, including ...levels of glucose, insulin, proinsulin and hemoglobin A1c (HbA1c). At the end of 2011, established loci (P < 5 × 10(-8)) totaled 55 for T2D and 32 for glycemic traits. Since then, most new loci have been detected by analyzing common minor allele frequency (MAF)>0.05 variants in increasingly large sample sizes from populations around the world, and in trans-ancestry studies that successfully combine data from diverse populations. Most recently, advances in sequencing have led to the discovery of four loci for T2D or glycemic traits based on low-frequency (0.005 < MAF ≤ 0.05) variants, and additional low-frequency, potentially functional variants have been identified at GWAS loci. Established published loci now total ∼88 for T2D and 83 for one or more glycemic traits, and many additional loci likely remain to be discovered. Future studies will build on these successes by identifying additional loci and by determining the pathogenic effects of the underlying variants and genes.
We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) ...extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency <5%, 14 with estimated allelic odds ratio >2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).
The Metabolic Syndrome in Men (METSIM) study is a population-based study including 10,197 Finnish men examined in 2005–2010. The aim of the study is to investigate nongenetic and genetic factors ...associated with the risk of T2D and CVD, and with cardiovascular risk factors. The protocol includes a detailed phenotyping of the participants, an oral glucose tolerance test, fasting laboratory measurements including proton NMR measurements, mass spectometry metabolomics, adipose tissue biopsies from 1,400 participants, and a stool sample. In our ongoing follow-up study, we have, to date, reexamined 6,496 participants. Extensive genotyping and exome sequencing have been performed for essentially all METSIM participants, and >2,000 METSIM participants have been whole-genome sequenced. We have identified several nongenetic markers associated with the development of diabetes and cardiovascular events, and participated in several genetic association studies to identify gene variants associated with diabetes, hyperglycemia, and cardiovascular risk factors. The generation of a phenotype and genotype resource in the METSIM study allows us to proceed toward a "systems genetics" approach, which includes statistical methods to quantitate and integrate intermediate phenotypes, such as transcript, protein, or metabolite levels, to provide a global view of the molecular architecture of complex traits.
Chromatin accessibility and gene expression in relevant cell contexts can guide identification of regulatory elements and mechanisms at genome-wide association study (GWAS) loci. To identify ...regulatory elements that display differential activity across adipocyte differentiation, we performed ATAC-seq and RNA-seq in a human cell model of preadipocytes and adipocytes at days 4 and 14 of differentiation. For comparison, we created a consensus map of ATAC-seq peaks in 11 human subcutaneous adipose tissue samples. We identified 58,387 context-dependent chromatin accessibility peaks and 3,090 context-dependent genes between all timepoint comparisons (log2 fold change>1, FDR<5%) with 15,919 adipocyte- and 18,244 preadipocyte-dependent peaks. Adipocyte-dependent peaks showed increased overlap (60.1%) with Roadmap Epigenomics adipocyte nuclei enhancers compared to preadipocyte-dependent peaks (11.5%). We linked context-dependent peaks to genes based on adipocyte promoter capture Hi-C data, overlap with adipose eQTL variants, and context-dependent gene expression. Of 16,167 context-dependent peaks linked to a gene, 5,145 were linked by two or more strategies to 1,670 genes. Among GWAS loci for cardiometabolic traits, adipocyte-dependent peaks, but not preadipocyte-dependent peaks, showed significant enrichment (LD score regression P<0.005) for waist-to-hip ratio and modest enrichment (P < 0.05) for HDL-cholesterol. We identified 659 peaks linked to 503 genes by two or more approaches and overlapping a GWAS signal, suggesting a regulatory mechanism at these loci. To identify variants that may alter chromatin accessibility between timepoints, we identified 582 variants in 454 context-dependent peaks that demonstrated allelic imbalance in accessibility (FDR<5%), of which 55 peaks also overlapped GWAS variants. At one GWAS locus for palmitoleic acid, rs603424 was located in an adipocyte-dependent peak linked to SCD and exhibited allelic differences in transcriptional activity in adipocytes (P = 0.003) but not preadipocytes (P = 0.09). These results demonstrate that context-dependent peaks and genes can guide discovery of regulatory variants at GWAS loci and aid identification of regulatory mechanisms.
Reverse causality has made it difficult to establish the causal directions between obesity and prediabetes and obesity and insulin resistance. To disentangle whether obesity causally drives ...prediabetes and insulin resistance already in non-diabetic individuals, we utilized the UK Biobank and METSIM cohort to perform a Mendelian randomization (MR) analyses in the non-diabetic individuals. Our results suggest that both prediabetes and systemic insulin resistance are caused by obesity (p = 1.2×10-3 and p = 3.1×10-24). As obesity reflects the amount of body fat, we next studied how adipose tissue affects insulin resistance. We performed both bulk RNA-sequencing and single nucleus RNA sequencing on frozen human subcutaneous adipose biopsies to assess adipose cell-type heterogeneity and mitochondrial (MT) gene expression in insulin resistance. We discovered that the adipose MT gene expression and body fat percent are both independently associated with insulin resistance (p≤0.05 for each) when adjusting for the decomposed adipose cell-type proportions. Next, we showed that these 3 factors, adipose MT gene expression, body fat percent, and adipose cell types, explain a substantial amount (44.39%) of variance in insulin resistance and can be used to predict it (p≤2.64×10-5 in 3 independent human cohorts). In summary, we demonstrated that obesity is a strong determinant of both prediabetes and insulin resistance, and discovered that individuals' adipose cell-type composition, adipose MT gene expression, and body fat percent predict their insulin resistance, emphasizing the critical role of adipose tissue in systemic insulin resistance.
The majority of variation identified by genome wide association studies falls in non-coding genomic regions and is hypothesized to impact regulatory elements that modulate gene expression. Here we ...present a statistically rigorous software tool GREGOR (Genomic Regulatory Elements and Gwas Overlap algoRithm) for evaluating enrichment of any set of genetic variants with any set of regulatory features. Using variants from five phenotypes, we describe a data-driven approach to determine the tissue and cell types most relevant to a trait of interest and to identify the subset of regulatory features likely impacted by these variants. Last, we experimentally evaluate six predicted functional variants at six lipid-associated loci and demonstrate significant evidence for allele-specific impact on expression levels. GREGOR systematically evaluates enrichment of genetic variation with the vast collection of regulatory data available to explore novel biological mechanisms of disease and guide us toward the functional variant at trait-associated loci.
GREGOR, including source code, documentation, examples, and executables, is available at http://genome.sph.umich.edu/wiki/GREGOR.
cristen@umich.edu
Supplementary data are available at Bioinformatics online.
Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders
. They are heritable
and etiologically related
behaviors that have been resistant ...to gene discovery efforts
. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.
To identify genetic contributions to type 2 diabetes (T2D) and related glycemic traits (fasting glucose, fasting insulin, and HbA1c), we conducted genome-wide association analyses (GWAS) in up to ...7,178 Chinese subjects from nine provinces in the China Health and Nutrition Survey (CHNS). We examined patterns of population structure within CHNS and found that allele frequencies differed across provinces, consistent with genetic drift and population substructure. We further validated 32 previously described T2D- and glycemic trait-loci, including G6PC2 and SIX3-SIX2 associated with fasting glucose. At G6PC2, we replicated a known fasting glucose-associated variant (rs34177044) and identified a second signal (rs2232326), a low-frequency (4%), probably damaging missense variant (S324P). A variant within the lead fasting glucose-associated signal at SIX3-SIX2 co-localized with pancreatic islet expression quantitative trait loci (eQTL) for SIX3, SIX2, and three noncoding transcripts. To identify variants functionally responsible for the fasting glucose association at SIX3-SIX2, we tested five candidate variants for allelic differences in regulatory function. The rs12712928-C allele, associated with higher fasting glucose and lower transcript expression level, showed lower transcriptional activity in reporter assays and increased binding to GABP compared to the rs12712928-G, suggesting that rs12712928-C contributes to elevated fasting glucose levels by disrupting an islet enhancer, resulting in reduced gene expression. Taken together, these analyses identified multiple loci associated with glycemic traits across China, and suggest a regulatory mechanism at the SIX3-SIX2 fasting glucose GWAS locus.
Many of the type 2 diabetes loci identified through genome-wide association studies localize to non-protein-coding intronic and intergenic regions and likely contain variants that regulate gene ...transcription. The CDC123/CAMK1D type 2 diabetes association signal on chromosome 10 spans an intergenic region between CDC123 and CAMK1D and also overlaps the CDC123 3'UTR. To gain insight into the molecular mechanisms underlying the association signal, we used open chromatin, histone modifications and transcription factor ChIP-seq data sets from type 2 diabetes-relevant cell types to identify SNPs overlapping predicted regulatory regions. Two regions containing type 2 diabetes-associated variants were tested for enhancer activity using luciferase reporter assays. One SNP, rs11257655, displayed allelic differences in transcriptional enhancer activity in 832/13 and MIN6 insulinoma cells as well as in human HepG2 hepatocellular carcinoma cells. The rs11257655 risk allele T showed greater transcriptional activity than the non-risk allele C in all cell types tested. Using electromobility shift and supershift assays we demonstrated that the rs11257655 risk allele showed allele-specific binding to FOXA1 and FOXA2. We validated FOXA1 and FOXA2 enrichment at the rs11257655 risk allele using allele-specific ChIP in human islets. These results suggest that rs11257655 affects transcriptional activity through altered binding of a protein complex that includes FOXA1 and FOXA2, providing a potential molecular mechanism at this GWAS locus.