Observational studies have identified height as a strong risk factor for atrial fibrillation, but this finding may be limited by residual confounding. We aimed to examine genetic variation in height ...within the Mendelian randomization (MR) framework to determine whether height has a causal effect on risk of atrial fibrillation.
In summary-level analyses, MR was performed using summary statistics from genome-wide association studies of height (GIANT/UK Biobank; 693,529 individuals) and atrial fibrillation (AFGen; 65,446 cases and 522,744 controls), finding that each 1-SD increase in genetically predicted height increased the odds of atrial fibrillation (odds ratio OR 1.34; 95% CI 1.29 to 1.40; p = 5 × 10-42). This result remained consistent in sensitivity analyses with MR methods that make different assumptions about the presence of pleiotropy, and when accounting for the effects of traditional cardiovascular risk factors on atrial fibrillation. Individual-level phenome-wide association studies of height and a height genetic risk score were performed among 6,567 European-ancestry participants of the Penn Medicine Biobank (median age at enrollment 63 years, interquartile range 55-72; 38% female; recruitment 2008-2015), confirming prior observational associations between height and atrial fibrillation. Individual-level MR confirmed that each 1-SD increase in height increased the odds of atrial fibrillation, including adjustment for clinical and echocardiographic confounders (OR 1.89; 95% CI 1.50 to 2.40; p = 0.007). The main limitations of this study include potential bias from pleiotropic effects of genetic variants, and lack of generalizability of individual-level findings to non-European populations.
In this study, we observed evidence that height is likely a positive causal risk factor for atrial fibrillation. Further study is needed to determine whether risk prediction tools including height or anthropometric risk factors can be used to improve screening and primary prevention of atrial fibrillation, and whether biological pathways involved in height may offer new targets for treatment of atrial fibrillation.
Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying ...sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance.
We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants.
This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses.
Genetic variants of the enzyme that metabolizes warfarin, cytochrome P-450 2C9 (CYP2C9), and of a key pharmacologic target of warfarin, vitamin K epoxide reductase (VKORC1), contribute to differences ...in patients' responses to various warfarin doses, but the role of these variants during initial anticoagulation is not clear.
In 297 patients starting warfarin therapy, we assessed CYP2C9 genotypes (CYP2C9 *1, *2, and *3), VKORC1 haplotypes (designated A and non-A), clinical characteristics, response to therapy (as determined by the international normalized ratio INR), and bleeding events. The study outcomes were the time to the first INR within the therapeutic range, the time to the first INR of more than 4, the time above the therapeutic INR range, the INR response over time, and the warfarin dose requirement.
As compared with patients with the non-A/non-A haplotype, patients with the A/A haplotype of VKORC1 had a decreased time to the first INR within the therapeutic range (P=0.02) and to the first INR of more than 4 (P=0.003). In contrast, the CYP2C9 genotype was not a significant predictor of the time to the first INR within the therapeutic range (P=0.57) but was a significant predictor of the time to the first INR of more than 4 (P=0.03). Both the CYP2C9 genotype and VKORC1 haplotype had a significant influence on the required warfarin dose after the first 2 weeks of therapy.
Initial variability in the INR response to warfarin was more strongly associated with genetic variability in the pharmacologic target of warfarin, VKORC1, than with CYP2C9.
Heart failure is a leading cause of cardiovascular morbidity and mortality. However, the contribution of common genetic variation to heart failure risk has not been fully elucidated, particularly in ...comparison to other common cardiometabolic traits. We report a multi-ancestry genome-wide association study meta-analysis of all-cause heart failure including up to 115,150 cases and 1,550,331 controls of diverse genetic ancestry, identifying 47 risk loci. We also perform multivariate genome-wide association studies that integrate heart failure with related cardiac magnetic resonance imaging endophenotypes, identifying 61 risk loci. Gene-prioritization analyses including colocalization and transcriptome-wide association studies identify known and previously unreported candidate cardiomyopathy genes and cellular processes, which we validate in gene-expression profiling of failing and healthy human hearts. Colocalization, gene expression profiling, and Mendelian randomization provide convergent evidence for the roles of BCKDHA and circulating branch-chain amino acids in heart failure and cardiac structure. Finally, proteome-wide Mendelian randomization identifies 9 circulating proteins associated with heart failure or quantitative imaging traits. These analyses highlight similarities and differences among heart failure and associated cardiovascular imaging endophenotypes, implicate common genetic variation in the pathogenesis of heart failure, and identify circulating proteins that may represent cardiomyopathy treatment targets.
As a type of relatively new methodology, the transcriptome-wide association study (TWAS) has gained interest due to capacity for gene-level association testing. However, the development of TWAS has ...outpaced statistical evaluation of TWAS gene prioritization performance. Current TWAS methods vary in underlying biological assumptions about tissue specificity of transcriptional regulatory mechanisms. In a previous study from our group, this may have affected whether TWAS methods better identified associations in single tissues versus multiple tissues. We therefore designed simulation analyses to examine how the interplay between particular TWAS methods and tissue specificity of gene expression affects power and type I error rates for gene prioritization. We found that cross-tissue identification of expression quantitative trait loci (eQTLs) improved TWAS power. Single-tissue TWAS (i.e., PrediXcan) had robust power to identify genes expressed in single tissues, but, often found significant associations in the wrong tissues as well (therefore had high false positive rates). Cross-tissue TWAS (i.e., UTMOST) had overall equal or greater power and controlled type I error rates for genes expressed in multiple tissues. Based on these simulation results, we applied a tissue specificity-aware TWAS (TSA-TWAS) analytic framework to look for gene-based associations with pre-treatment laboratory values from AIDS Clinical Trial Group (ACTG) studies. We replicated several proof-of-concept transcriptionally regulated gene-trait associations, including UGT1A1 (encoding bilirubin uridine diphosphate glucuronosyltransferase enzyme) and total bilirubin levels (p = 3.59×10-12), and CETP (cholesteryl ester transfer protein) with high-density lipoprotein cholesterol (p = 4.49×10-12). We also identified several novel genes associated with metabolic and virologic traits, as well as pleiotropic genes that linked plasma viral load, absolute basophil count, and/or triglyceride levels. By highlighting the advantages of different TWAS methods, our simulation study promotes a tissue specificity-aware TWAS analytic framework that revealed novel aspects of HIV-related traits.
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from ...distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R (2) (estimated correlation between the imputed and true genotypes), and the relationship between allelic R (2) and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the ...important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the ...genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, ...between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.
We performed a Phenome-Wide Association Study (PheWAS) to identify interrelationships between the immune system genetic architecture and a wide array of phenotypes from two de-identified electronic ...health record (EHR) biorepositories. We selected variants within genes encoding critical factors in the immune system and variants with known associations with autoimmunity. To define case/control status for EHR diagnoses, we used International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from 3,024 Geisinger Clinic MyCode® subjects (470 diagnoses) and 2,899 Vanderbilt University Medical Center BioVU biorepository subjects (380 diagnoses). A pooled-analysis was also carried out for the replicating results of the two data sets. We identified new associations with potential biological relevance including SNPs in tumor necrosis factor (TNF) and ankyrin-related genes associated with acute and chronic sinusitis and acute respiratory tract infection. The two most significant associations identified were for the C6orf10 SNP rs6910071 and "rheumatoid arthritis" (ICD-9 code category 714) (pMETAL = 2.58 x 10-9) and the ATN1 SNP rs2239167 and "diabetes mellitus, type 2" (ICD-9 code category 250) (pMETAL = 6.39 x 10-9). This study highlights the utility of using PheWAS in conjunction with EHRs to discover new genotypic-phenotypic associations for immune-system related genetic loci.