Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the ...contribution of common Neandertal variants to over 1000 electronic health record (EHR)–derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neandertal alleles were significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neandertal haplotypes, and demonstrate the utility of EHR data in evolutionary analyses.
Polycystic ovary syndrome is the most common endocrine disorder affecting women of reproductive age. A number of criteria have been developed for clinical diagnosis of polycystic ovary syndrome, with ...the Rotterdam criteria being the most inclusive. Evidence suggests that polycystic ovary syndrome is significantly heritable, and previous studies have identified genetic variants associated with polycystic ovary syndrome diagnosed using different criteria. The widely adopted electronic health record system provides an opportunity to identify patients with polycystic ovary syndrome using the Rotterdam criteria for genetic studies.
To identify novel associated genetic variants under the same phenotype definition, we extracted polycystic ovary syndrome cases and unaffected controls based on the Rotterdam criteria from the electronic health records and performed a discovery-validation genome-wide association study.
We developed a polycystic ovary syndrome phenotyping algorithm on the basis of the Rotterdam criteria and applied it to 3 electronic health record–linked biobanks to identify cases and controls for genetic study. In the discovery phase, we performed an individual genome-wide association study using the Geisinger MyCode and the Electronic Medical Records and Genomics cohorts, which were then meta-analyzed. We attempted validation of the significant association loci (P<1×10−6) in the BioVU cohort. All association analyses used logistic regression, assuming an additive genetic model, and adjusted for principal components to control for population stratification. An inverse-variance fixed-effect model was adopted for meta-analysis. In addition, we examined the top variants to evaluate their associations with each criterion in the phenotyping algorithm. We used the STRING database to characterize protein-protein interaction network.
Using the same algorithm based on the Rotterdam criteria, we identified 2995 patients with polycystic ovary syndrome and 53,599 population controls in total (2742 cases and 51,438 controls from the discovery phase; 253 cases and 2161 controls in the validation phase). We identified 1 novel genome-wide significant variant rs17186366 (odds ratio OR=1.37 1.23, 1.54, P=2.8×10−8) located near SOD2. In addition, 2 loci with suggestive association were also identified: rs113168128 (OR=1.72 1.42, 2.10, P=5.2×10−8), an intronic variant of ERBB4 that is independent from the previously published variants, and rs144248326 (OR=2.13 1.52, 2.86, P=8.45×10−7), a novel intronic variant in WWTR1. In the further association tests of the top 3 single-nucleotide polymorphisms with each criterion in the polycystic ovary syndrome algorithm, we found that rs17186366 (SOD2) was associated with polycystic ovaries and hyperandrogenism, whereas rs11316812 (ERBB4) and rs144248326 (WWTR1) were mainly associated with oligomenorrhea or infertility. We also validated the previously reported association with DENND1A1. Using the STRING database to characterize protein-protein interactions, we found both ERBB4 and WWTR1 can interact with YAP1, which has been previously associated with polycystic ovary syndrome.
Through a discovery-validation genome-wide association study on polycystic ovary syndrome identified from electronic health records using an algorithm based on Rotterdam criteria, we identified and validated a novel genome-wide significant association with a variant near SOD2. We also identified a novel independent variant within ERBB4 and a suggestive association with WWTR1. With previously identified polycystic ovary syndrome gene YAP1, the ERBB4-YAP1-WWTR1 network suggests involvement of the epidermal growth factor receptor and the Hippo pathway in the multifactorial etiology of polycystic ovary syndrome.
Melanocortin-4 receptor (MC4R) plays an essential role in food intake and energy homeostasis. More than 170 MC4R variants have been described over the past two decades, with conflicting reports ...regarding the prevalence and phenotypic effects of these variants in diverse cohorts. To determine the frequency of MC4R variants in large cohort of different ancestries, we evaluated the MC4R coding region for 20,537 eMERGE participants with sequencing data plus additional 77,454 independent individuals with genome-wide genotyping data at this locus.
The sequencing data were obtained from the eMERGE phase III study, in which multisample variant call format calls have been generated, curated, and annotated. In addition to penetrance estimation using body mass index (BMI) as a binary outcome, GWAS and PheWAS were performed using median BMI in linear regression analyses. All results were adjusted for principal components, age, sex, and sites of genotyping.
Targeted sequencing data of MC4R revealed 125 coding variants in 1839 eMERGE participants including 30 unreported coding variants that were predicted to be functionally damaging. Highly penetrant unreported variants included (L325I, E308K, D298N, S270F, F261L, T248A, D111V, and Y80F) in which seven participants had obesity class III defined as BMI ≥ 40 kg/m
. In GWAS analysis, in addition to known risk haplotype upstream of MC4R (best variant rs6567160 (P = 5.36 × 10
, Beta = 0.37), a novel rare haplotype was detected which was protective against obesity and encompassed the V103I variant with known gain-of-function properties (P = 6.23 × 10
, Beta = -0.62). PheWAS analyses extended this protective effect of V103I to type 2 diabetes, diabetic nephropathy, and chronic renal failure independent of BMI.
MC4R screening in a large eMERGE cohort confirmed many previous findings, extend the MC4R pleotropic effects, and discovered additional MC4R rare alleles that probably contribute to obesity.
GWASTools is an R/Bioconductor package for quality control and analysis of genome-wide association studies (GWAS). GWASTools brings the interactive capability and extensive statistical libraries of R ...to GWAS. Data are stored in NetCDF format to accommodate extremely large datasets that cannot fit within R's memory limits. The documentation includes instructions for converting data from multiple formats, including variants called from sequencing. GWASTools provides a convenient interface for linking genotypes and intensity data with sample and single nucleotide polymorphism annotation.
The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute–funded consortium engaged in the development of methods and best practices for using the electronic ...medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here, we describe the evolution, accomplishments, opportunities, and challenges of the network from its inception as a five-group consortium focused on genotype–phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine.
Genet Med15 10, 761–771.
As genetics gains favor in clinical oncology, it is important to address patient concerns around confidentiality, privacy, and security of genetic information that might otherwise limit its ...utilization. We designed a randomized controlled trial to assess the social impact of an online educational tool (FamilyTalk) to increase family communication about colorectal cancer (CRC) risk and screening. Of 208 randomized participants, 149 (71.6%) returned six-month surveys. Overall, there was no difference in CRC screening between the study arms. Privacy and confidentiality concerns about medical and genetic information, reactions to genetic test results, and lifestyle changes did not differ between arms. Participants with pathogenic or likely pathogenic (P/LP) and variant of uncertain significance (VUS) results were more likely than those with negative results to report that the results accurately predicted their disease risks (OR 5.37,
p
= 0.02 and OR 3.13,
p
= 0.02, respectively). This trial demonstrated no evidence that FamilyTalk impacted patient-reported outcomes. Low power, due to the limited number of participants with P/LP results in the overall sample, as well as the short follow-up period, could have contributed to the null findings.
Abstract Objectives Copy number variants (CNVs) are duplications or deletions of genomic regions. Large CNVs are potentially pathogenic and are overrepresented in children with congenital heart ...disease (CHD). We sought to determine the frequency of large CNVs in children with isolated CHD, and to evaluate the relationship of these potentially pathogenic CNVs with transplant-free survival. Methods These cases are derived from a prospective cohort of patients with nonsyndromic CHD (n = 422) identified before first surgery. Healthy pediatric controls (n = 500) were obtained from the electronic Medical Records and Genetic Epidemiology Network, and CNV frequency was contrasted for CHD cases and controls. CNVs were determined algorithmically; subsequently screened for >95% overlap between 2 methods, size (>300 kb), quality score, overlap with a gene, and novelty (absent from databases of known, benign CNVs); and separately validated by quantitative polymerase chain reaction. Survival likelihoods for cases were calculated using Cox proportional hazards modeling to evaluate the joint effect of CNV burden and known confounders on transplant-free survival. Results Children with nonsyndromic CHD had a higher burden of potentially pathogenic CNVs compared with pediatric controls (12.1% vs 5.0%; P = .00016). Presence of a CNV was associated with significantly decreased transplant-free survival after surgery (hazard ratio, 3.42; 95% confidence interval, 1.66-7.09; P = .00090) with confounder adjustment. Conclusions We confirm that children with isolated CHD have a greater burden of rare/large CNVs. We report a novel finding that these CNVs are associated with an adjusted 2.55-fold increased risk of death or transplant. These data suggest that CNV burden is an important modifier of survival after surgery for CHD.
Identifying genetic risk factors for lumbar spine disorders may lead to knowledge regarding underlying mechanisms and the development of new treatments. We conducted a genome-wide association study ...involving 100,811 participants with genotypes and longitudinal electronic health record data from the Electronic Medical Records and Genomics Network and Geisinger Health. Cases and controls were defined using validated algorithms and clinical diagnostic codes. Electronic health record-defined phenotypes included low back pain requiring healthcare utilization (LBP-HC), lumbosacral radicular syndrome (LSRS), and lumbar spinal stenosis (LSS). Genome-wide association study used logistic regression with additive genetic effects adjusting for age, sex, site-specific factors, and ancestry (principal components). A fixed-effect inverse-variance weighted meta-analysis was conducted. Genetic variants of genome-wide significance (P < 5 × 10-8) were carried forward for replication in an independent sample from UK Biobank. Phenotype prevalence was 48.8% for LBP-HC, 19.8% for LSRS, and 7.9% for LSS. No variants were significantly associated with LBP-HC. One locus was associated with LSRS (lead variant rs146153280:C>G, odds ratio OR = 1.17 for G, P = 2.1 × 10-9), but was not replicated. Another locus on chromosome 2 spanning GFPT1, NFU1, and AAK1 was associated with LSS (lead variant rs13427243:G>A, OR = 1.10 for A, P = 4.3 × 10-8) and replicated in UK Biobank (OR = 1.11, P = 5.4 × 10-5). This was the first genome-wide association study meta-analysis of lumbar spinal disorders using electronic health record data. We identified 2 novel associations with LSRS and LSS; the latter was replicated in an independent sample.
Coronary heart disease (CHD) is a leading cause of death globally. Although therapy with statins decreases circulating levels of low-density lipoprotein cholesterol and the incidence of CHD, ...additional events occur despite statin therapy in some individuals. The genetic determinants of this residual cardiovascular risk remain unknown.
We performed a 2-stage genome-wide association study of CHD events during statin therapy. We first identified 3099 cases who experienced CHD events (defined as acute myocardial infarction or the need for coronary revascularization) during statin therapy and 7681 controls without CHD events during comparable intensity and duration of statin therapy from 4 sites in the Electronic Medical Records and Genomics Network. We then sought replication of candidate variants in another 160 cases and 1112 controls from a fifth Electronic Medical Records and Genomics site, which joined the network after the initial genome-wide association study. Finally, we performed a phenome-wide association study for other traits linked to the most significant locus.
The meta-analysis identified 7 single nucleotide polymorphisms at a genome-wide level of significance within the LPA/PLG locus associated with CHD events on statin treatment. The most significant association was for an intronic single nucleotide polymorphism within LPA/PLG (rs10455872; minor allele frequency, 0.069; odds ratio, 1.58; 95% confidence interval, 1.35-1.86; P=2.6×10
). In the replication cohort, rs10455872 was also associated with CHD events (odds ratio, 1.71; 95% confidence interval, 1.14-2.57; P=0.009). The association of this single nucleotide polymorphism with CHD events was independent of statin-induced change in low-density lipoprotein cholesterol (odds ratio, 1.62; 95% confidence interval, 1.17-2.24; P=0.004) and persisted in individuals with low-density lipoprotein cholesterol ≤70 mg/dL (odds ratio, 2.43; 95% confidence interval, 1.18-4.75; P=0.015). A phenome-wide association study supported the effect of this region on coronary heart disease and did not identify noncardiovascular phenotypes.
Genetic variations at the LPA locus are associated with CHD events during statin therapy independently of the extent of low-density lipoprotein cholesterol lowering. This finding provides support for exploring strategies targeting circulating concentrations of lipoprotein(a) to reduce CHD events in patients receiving statins.
Clostridioides difficile (C. diff.) infection (CDI) is a leading cause of hospital acquired diarrhea in North America and Europe and a major cause of morbidity and mortality. Known risk factors do ...not fully explain CDI susceptibility, and genetic susceptibility is suggested by the fact that some patients with colons that are colonized with C. diff. do not develop any infection while others develop severe or recurrent infections. To identify common genetic variants associated with CDI, we performed a genome-wide association analysis in 19,861 participants (1349 cases; 18,512 controls) from the Electronic Medical Records and Genomics (eMERGE) Network. Using logistic regression, we found strong evidence for genetic variation in the DRB locus of the MHC (HLA) II region that predisposes individuals to CDI (P > 1.0 × 10
; OR 1.56). Altered transcriptional regulation in the HLA region may play a role in conferring susceptibility to this opportunistic enteric pathogen.