Understanding the genetic basis of airflow obstruction and smoking behaviour is key to determining the pathophysiology of chronic obstructive pulmonary disease (COPD). We used UK Biobank data to ...study the genetic causes of smoking behaviour and lung health.
We sampled individuals of European ancestry from UK Biobank, from the middle and extremes of the forced expiratory volume in 1 s (FEV1) distribution among heavy smokers (mean 35 pack-years) and never smokers. We developed a custom array for UK Biobank to provide optimum genome-wide coverage of common and low-frequency variants, dense coverage of genomic regions already implicated in lung health and disease, and to assay rare coding variants relevant to the UK population. We investigated whether there were shared genetic causes between different phenotypes defined by extremes of FEV1. We also looked for novel variants associated with extremes of FEV1 and smoking behaviour and assessed regions of the genome that had already shown evidence for a role in lung health and disease. We set genome-wide significance at p<5 × 10−8.
UK Biobank participants were recruited from March 15, 2006, to July 7, 2010. Sample selection for the UK BiLEVE study started on Nov 22, 2012, and was completed on Dec 20, 2012. We selected 50 008 unique samples: 10 002 individuals with low FEV1, 10 000 with average FEV1, and 5002 with high FEV1 from each of the heavy smoker and never smoker groups. We noted a substantial sharing of genetic causes of low FEV1 between heavy smokers and never smokers (p=2·29 × 10−16) and between individuals with and without doctor-diagnosed asthma (p=6·06 × 10−11). We discovered six novel genome-wide significant signals of association with extremes of FEV1, including signals at four novel loci (KANSL1, TSEN54, TET2, and RBM19/TBX5) and independent signals at two previously reported loci (NPNT and HLA-DQB1/HLA-DQA2). These variants also showed association with COPD, including in individuals with no history of smoking. The number of copies of a 150 kb region containing the 5′ end of KANSL1, a gene that is important for epigenetic gene regulation, was associated with extremes of FEV1. We also discovered five new genome-wide significant signals for smoking behaviour, including a variant in NCAM1 (chromosome 11) and a variant on chromosome 2 (between TEX41 and PABPC1P2) that has a trans effect on expression of NCAM1 in brain tissue.
By sampling from the extremes of the lung function distribution in UK Biobank, we identified novel genetic causes of lung function and smoking behaviour. These results provide new insight into the specific mechanisms underlying airflow obstruction, COPD, and tobacco addiction, and show substantial shared genetic architecture underlying airflow obstruction across individuals, irrespective of smoking behaviour and other airway disease.
Medical Research Council.
The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, ...East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language's expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. In agreement with the elite dominance model of language expansion most of the Turkic peoples studied genetically resemble their geographic neighbors. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. While SSM matching IBD tracts (> 1cM) are also observed in non-Turkic populations, Turkic peoples demonstrate a higher percentage of such tracts (p-values ≤ 0.01) compared to their non-Turkic neighbors. Finally, we used the ALDER method and inferred admixture dates (~9th-17th centuries) that overlap with the Turkic migrations of the 5th-16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.
The association of copy number variations (CNVs), differing numbers of copies of genetic sequence at locations in the genome, with phenotypes such as intellectual disability has been almost ...exclusively evaluated using clinically ascertained cohorts. The contribution of these genetic variants to cognitive phenotypes in the general population remains unclear.
To investigate the clinical features conferred by CNVs associated with known syndromes in adult carriers without clinical preselection and to assess the genome-wide consequences of rare CNVs (frequency ≤0.05%; size ≥250 kilobase pairs kb) on carriers' educational attainment and intellectual disability prevalence in the general population.
The population biobank of Estonia contains 52,000 participants enrolled from 2002 through 2010. General practitioners examined participants and filled out a questionnaire of health- and lifestyle-related questions, as well as reported diagnoses. Copy number variant analysis was conducted on a random sample of 7877 individuals and genotype-phenotype associations with education and disease traits were evaluated. Our results were replicated on a high-functioning group of 993 Estonians and 3 geographically distinct populations in the United Kingdom, the United States, and Italy.
Phenotypes of genomic disorders in the general population, prevalence of autosomal CNVs, and association of these variants with educational attainment (from less than primary school through scientific degree) and prevalence of intellectual disability.
Of the 7877 in the Estonian cohort, we identified 56 carriers of CNVs associated with known syndromes. Their phenotypes, including cognitive and psychiatric problems, epilepsy, neuropathies, obesity, and congenital malformations are similar to those described for carriers of identical rearrangements ascertained in clinical cohorts. A genome-wide evaluation of rare autosomal CNVs (frequency, ≤0.05%; ≥250 kb) identified 831 carriers (10.5%) of the screened general population. Eleven of 216 (5.1%) carriers of a deletion of at least 250 kb (odds ratio OR, 3.16; 95% CI, 1.51-5.98; P = 1.5e-03) and 6 of 102 (5.9%) carriers of a duplication of at least 1 Mb (OR, 3.67; 95% CI, 1.29-8.54; P = .008) had an intellectual disability compared with 114 of 6819 (1.7%) in the Estonian cohort. The mean education attainment was 3.81 (P = 1.06e-04) among 248 (≥250 kb) deletion carriers and 3.69 (P = 5.024e-05) among 115 duplication carriers (≥1 Mb). Of the deletion carriers, 33.5% did not graduate from high school (OR, 1.48; 95% CI, 1.12-1.95; P = .005) and 39.1% of duplication carriers did not graduate high school (OR, 1.89; 95% CI, 1.27-2.8; P = 1.6e-03). Evidence for an association between rare CNVs and lower educational attainment was supported by analyses of cohorts of adults from Italy and the United States and adolescents from the United Kingdom.
Known pathogenic CNVs in unselected, but assumed to be healthy, adult populations may be associated with unrecognized clinical sequelae. Additionally, individually rare but collectively common intermediate-size CNVs may be negatively associated with educational attainment. Replication of these findings in additional population groups is warranted given the potential implications of this observation for genomics research, clinical care, and public health.
The presence of autoantibodies usually precedes autoimmune disease, but is sometimes considered an incidental finding with no clinical relevance. The prevalence of immune-mediated diseases was ...studied in a group of individuals from the Estonian Genome Project (n = 51,862), and 6 clinically significant autoantibodies were detected in a subgroup of 994 (auto)immune-mediated disease-free individuals. The overall prevalence of individuals with immune-mediated diseases in the primary cohort was 30.1%. Similarly, 23.6% of the participants in the disease-free subgroup were seropositive for at least one autoantibody. Several phenotypic parameters were associated with autoantibodies. The results suggest that (i) immune-mediated diseases are diagnosed in nearly one-third of a random European population, (ii) 6 common autoantibodies are detectable in almost one-third of individuals without diagnosed autoimmune diseases, (iii) tissue non-specific autoantibodies, especially at high levels, may reflect preclinical disease in symptom-free individuals, and (iv) the incidental positivity of anti-TPO in men with positive familial anamnesis of maternal autoimmune disease deserves further medical attention. These results encourage physicians to evaluate autoantibodies in addition to treating a variety of patient health complaints to detect autoimmune-mediated disease early.
We aimed to assess whether whole blood expression quantitative trait loci (eQTLs) with effects in cis and trans are robust and can be used to identify regulatory pathways affecting disease ...susceptibility.
We performed whole-genome eQTL analyses in 890 participants of the KORA F4 study and in two independent replication samples (SHIP-TREND, N = 976 and EGCUT, N = 842) using linear regression models and Bonferroni correction.
In the KORA F4 study, 4,116 cis-eQTLs (defined as SNP-probe pairs where the SNP is located within a 500 kb window around the transcription unit) and 94 trans-eQTLs reached genome-wide significance and overall 91% (92% of cis-, 84% of trans-eQTLs) were confirmed in at least one of the two replication studies. Different study designs including distinct laboratory reagents (PAXgene™ vs. Tempus™ tubes) did not affect reproducibility (separate overall replication overlap: 78% and 82%). Immune response pathways were enriched in cis- and trans-eQTLs and significant cis-eQTLs were partly coexistent in other tissues (cross-tissue similarity 40-70%). Furthermore, four chromosomal regions displayed simultaneous impact on multiple gene expression levels in trans, and 746 eQTL-SNPs have been previously reported to have clinical relevance. We demonstrated cross-associations between eQTL-SNPs, gene expression levels in trans, and clinical phenotypes as well as a link between eQTLs and human metabolic traits via modification of gene regulation in cis.
Our data suggest that whole blood is a robust tissue for eQTL analysis and may be used both for biomarker studies and to enhance our understanding of molecular mechanisms underlying gene-disease associations.
We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant ...single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.
Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association study (GWAS) analysis of common SNPs. Leveraging phased haplotypes ...from the 1000 Genomes Project, we report a GWAS meta-analysis of ~185,000 CAD cases and controls, interrogating 6.7 million common (minor allele frequency (MAF) > 0.05) and 2.7 million low-frequency (0.005 < MAF < 0.05) variants. In addition to confirming most known CAD-associated loci, we identified ten new loci (eight additive and two recessive) that contain candidate causal genes newly implicating biological processes in vessel walls. We observed intralocus allelic heterogeneity but little evidence of low-frequency variants with larger effects and no evidence of synthetic association. Our analysis provides a comprehensive survey of the fine genetic architecture of CAD, showing that genetic susceptibility to this common disease is largely determined by common SNPs of small effect size.
Genome-wide association studies have identified numerous loci linked with complex diseases, for which the molecular mechanisms remain largely unclear. Comprehensive molecular profiling of circulating ...metabolites captures highly heritable traits, which can help to uncover metabolic pathophysiology underlying established disease variants. We conduct an extended genome-wide association study of genetic influences on 123 circulating metabolic traits quantified by nuclear magnetic resonance metabolomics from up to 24,925 individuals and identify eight novel loci for amino acids, pyruvate and fatty acids. The LPA locus link with cardiovascular risk exemplifies how detailed metabolic profiling may inform underlying aetiology via extensive associations with very-low-density lipoprotein and triglyceride metabolism. Genetic fine mapping and Mendelian randomization uncover wide-spread causal effects of lipoprotein(a) on overall lipoprotein metabolism and we assess potential pleiotropic consequences of genetically elevated lipoprotein(a) on diverse morbidities via electronic health-care records. Our findings strengthen the argument for safe LPA-targeted intervention to reduce cardiovascular risk.
The discovery of low-frequency coding variants affecting the risk of coronary artery disease has facilitated the identification of therapeutic targets.
Through DNA genotyping, we tested 54,003 ...coding-sequence variants covering 13,715 human genes in up to 72,868 patients with coronary artery disease and 120,770 controls who did not have coronary artery disease. Through DNA sequencing, we studied the effects of loss-of-function mutations in selected genes.
We confirmed previously observed significant associations between coronary artery disease and low-frequency missense variants in the genes LPA and PCSK9. We also found significant associations between coronary artery disease and low-frequency missense variants in the genes SVEP1 (p.D2702G; minor-allele frequency, 3.60%; odds ratio for disease, 1.14; P=4.2×10(-10)) and ANGPTL4 (p.E40K; minor-allele frequency, 2.01%; odds ratio, 0.86; P=4.0×10(-8)), which encodes angiopoietin-like 4. Through sequencing of ANGPTL4, we identified 9 carriers of loss-of-function mutations among 6924 patients with myocardial infarction, as compared with 19 carriers among 6834 controls (odds ratio, 0.47; P=0.04); carriers of ANGPTL4 loss-of-function alleles had triglyceride levels that were 35% lower than the levels among persons who did not carry a loss-of-function allele (P=0.003). ANGPTL4 inhibits lipoprotein lipase; we therefore searched for mutations in LPL and identified a loss-of-function variant that was associated with an increased risk of coronary artery disease (p.D36N; minor-allele frequency, 1.9%; odds ratio, 1.13; P=2.0×10(-4)) and a gain-of-function variant that was associated with protection from coronary artery disease (p.S447*; minor-allele frequency, 9.9%; odds ratio, 0.94; P=2.5×10(-7)).
We found that carriers of loss-of-function mutations in ANGPTL4 had triglyceride levels that were lower than those among noncarriers; these mutations were also associated with protection from coronary artery disease. (Funded by the National Institutes of Health and others.).
High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations, or been targeted at specific diseases, such as cancer. Nevertheless, the ...availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.