Metabolite quantitative traits carry great promise for epidemiological studies, and their genetic background has been addressed using Genome-Wide Association Studies (GWAS). Thus far, the role of ...less common variants has not been exhaustively studied. Here, we set out a GWAS for metabolite quantitative traits in serum, followed by exome sequence analysis to zoom in on putative causal variants in the associated genes. 1H Nuclear Magnetic Resonance (1H-NMR) spectroscopy experiments yielded successful quantification of 42 unique metabolites in 2,482 individuals from The Erasmus Rucphen Family (ERF) study. Heritability of metabolites were estimated by SOLAR. GWAS was performed by linear mixed models, using HapMap imputations. Based on physical vicinity and pathway analyses, candidate genes were screened for coding region variation using exome sequence data. Heritability estimates for metabolites ranged between 10% and 52%. GWAS replicated three known loci in the metabolome wide significance: CPS1 with glycine (P-value = 1.27×10-32), PRODH with proline (P-value = 1.11×10-19), SLC16A9 with carnitine level (P-value = 4.81×10-14) and uncovered a novel association between DMGDH and dimethyl-glycine (P-value = 1.65×10-19) level. In addition, we found three novel, suggestively significant loci: TNP1 with pyruvate (P-value = 1.26×10-8), KCNJ16 with 3-hydroxybutyrate (P-value = 1.65×10-8) and 2p12 locus with valine (P-value = 3.49×10-8). Exome sequence analysis identified potentially causal coding and regulatory variants located in the genes CPS1, KCNJ2 and PRODH, and revealed allelic heterogeneity for CPS1 and PRODH. Combined GWAS and exome analyses of metabolites detected by high-resolution 1H-NMR is a robust approach to uncover metabolite quantitative trait loci (mQTL), and the likely causative variants in these loci. It is anticipated that insight in the genetics of intermediate phenotypes will provide additional insight into the genetics of complex traits.
In order to meaningfully analyze common and rare genetic variants, results from genome-wide association studies (GWASs) of multiple cohorts need to be combined in a meta-analysis in order to obtain ...enough power. This requires all cohorts to have the same single-nucleotide polymorphisms (SNPs) in their GWASs. To this end, genotypes that have not been measured in a given cohort can be imputed on the basis of a set of reference haplotypes. This protocol provides guidelines for performing imputations with two widely used tools: minimac and IMPUTE2. These guidelines were developed and used by the Genome of the Netherlands (GoNL) consortium, which has created a population-specific reference panel for genetic imputations and used this reference to impute various Dutch biobanks. We also describe several factors that might influence the final imputation quality. This protocol, which has been used by the largest Dutch biobanks, should take approximately several days, depending on the sample size of the biobank and the computer resources available.
Using a genome-wide screen of 9.6 million genetic variants achieved through 1000 Genomes Project imputation in 62,166 samples, we identify association to lipid traits in 93 loci, including 79 ...previously identified loci with new lead SNPs and 10 new loci, 15 loci with a low-frequency lead SNP and 10 loci with a missense lead SNP, and 2 loci with an accumulation of rare variants. In six loci, SNPs with established function in lipid genetics (CELSR2, GCKR, LIPC and APOE) or candidate missense mutations with predicted damaging function (CD300LG and TM6SF2) explained the locus associations. The low-frequency variants increased the proportion of variance explained, particularly for low-density lipoprotein cholesterol and total cholesterol. Altogether, our results highlight the impact of low-frequency variants in complex traits and show that imputation offers a cost-effective alternative to resequencing.
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive ...manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.
Abstract
Genome-wide association studies have provided a vast array of publicly available SNP × phenotype association results. However, they are often in disparate repositories and formats, making ...downstream analyses difficult and time consuming. PheLiGe (https://phelige.com) is a database that provides easy access to such results via a web interface. The underlying database currently stores >75 billion genotype–phenotype associations from 7347 genome-wide and 1.2 million region-wide (e.g. cis-eQTL) association scans. The web interface allows for investigation of regional genotype-phenotype associations across many phenotypes, giving insights into the biological function affected by the variant in question. Furthermore, PheLiGe can compare regional patterns of association between different traits. This analysis can ascertain whether a co-association is due to pleiotropy or linkage. Moreover, comparison of association patterns for a complex trait of interest and gene expression and protein levels can implicate causal genes.
Coffee, one of the most popular beverages in the world, contains many different physiologically active compounds with a potential impact on people's health. Despite the recent attention given to the ...genetic basis of its consumption, very little has been done in understanding genes influencing coffee preference among different individuals. Given its markedly bitter taste, we decided to verify if bitter receptor genes (TAS2Rs) variants affect coffee liking. In this light, 4066 people from different parts of Europe and Central Asia filled in a field questionnaire on coffee liking. They have been consequently recruited and included in the study. Eighty-eight SNPs covering the 25 TAS2R genes were selected from the available imputed ones and used to run association analysis for coffee liking. A significant association was detected with three SNP: one synonymous and two functional variants (W35S and H212R) on the TAS2R43 gene. Both variants have been shown to greatly reduce in vitro protein activity. Surprisingly the wild type allele, which corresponds to the functional form of the protein, is associated to higher liking of coffee. Since the hTAS2R43 receptor is sensible to caffeine, we verified if the detected variants produced differences in caffeine bitter perception on a subsample of people coming from the FVG cohort. We found a significant association between differences in caffeine perception and the H212R variant but not with the W35S, which suggests that the effect of the TAS2R43 gene on coffee liking is mediated by caffeine and in particular by the H212R variant. No other significant association was found with other TAS2R genes. In conclusion, the present study opens new perspectives in the understanding of coffee liking. Further studies are needed to clarify the role of the TAS2R43 gene in coffee hedonics and to identify which other genes and pathways are involved in its genetics.
Back pain (BP) is a common condition of major social importance and poorly understood pathogenesis. Combining data from the UK Biobank and CHARGE consortium cohorts allowed us to perform a very large ...genome-wide association study (total N = 509,070) and examine the genetic correlation and pleiotropy between BP and its clinical and psychosocial risk factors. We identified and replicated 3 BP-associated loci, including one novel region implicating SPOCK2/CHST3 genes. We provide evidence for pleiotropic effects of genetic factors underlying BP, height, and intervertebral disk problems. We also identified independent genetic correlations between BP and depression symptoms, neuroticism, sleep disturbance, overweight, and smoking. A significant enrichment for genes involved in the central nervous system and skeletal tissue development was observed. The study of pleiotropy and genetic correlations, supported by the pathway analysis, suggests at least 2 strong molecular axes of BP genesis, one related to structural/anatomical factors such as intervertebral disk problems and anthropometrics, and another related to the psychological component of pain perception and pain processing. These findings corroborate with the current biopsychosocial model as a paradigm for BP. Overall, the results demonstrate BP to have an extremely complex genetic architecture that overlaps with the genetic predisposition to its biopsychosocial risk factors. The work sheds light on pathways of relevance in the prevention and management of low BP.
Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential ...proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic variants. Methods to detect CH-like effects in genome-wide association studies (GWAS) may facilitate explaining the missing heritability, but to our knowledge no viable software tools for this purpose are currently available.
In this work we present the Generalized Compound Double Heterozygosity (GCDH) test and its implementation in the R package CollapsABEL. Time-consuming procedures are optimized for computational efficiency using Java or C++. Intermediate results are stored either in an SQL database or in a so-called big.matrix file to achieve reasonable memory footprint. Our large scale simulation studies show that GCDH is capable of discovering genetic associations due to CH-like interactions with much higher power than a conventional single-SNP approach under various settings, whether the causal genetic variations are available or not. CollapsABEL provides a user-friendly pipeline for genotype collapsing, statistical testing, power estimation, type I error control and graphics generation in the R language.
CollapsABEL provides a computationally efficient solution for screening general forms of CH alleles in densely imputed microarray or whole genome sequencing datasets. The GCDH test provides an improved power over single-SNP based methods in detecting the prevalence of CH in human complex phenotypes, offering an opportunity for tackling the missing heritability problem. Binary and source packages of CollapsABEL are available on CRAN ( https://cran.r-project.org/web/packages/CollapsABEL ) and the website of the GenABEL project ( http://www.genabel.org/packages ).
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo ...indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.
Kim et al. identify novel genes and disease pathways in the forebrain developmental disorder holoprosencephaly, and show that many cases involve oligogenic inheritance. The findings underline the ...roles of Sonic Hedgehog and primary cilia in forebrain development, and show that integrating clinical phenotyping into genetic studies can uncover relevant mutations.
Abstract
Holoprosencephaly is a pathology of forebrain development characterized by high phenotypic heterogeneity. The disease presents with various clinical manifestations at the cerebral or facial levels. Several genes have been implicated in holoprosencephaly but its genetic basis remains unclear: different transmission patterns have been described including autosomal dominant, recessive and digenic inheritance. Conventional molecular testing approaches result in a very low diagnostic yield and most cases remain unsolved. In our study, we address the possibility that genetically unsolved cases of holoprosencephaly present an oligogenic origin and result from combined inherited mutations in several genes. Twenty-six unrelated families, for whom no genetic cause of holoprosencephaly could be identified in clinical settings whole exome sequencing and comparative genomic hybridization (CGH)-array analyses, were reanalysed under the hypothesis of oligogenic inheritance. Standard variant analysis was improved with a gene prioritization strategy based on clinical ontologies and gene co-expression networks. Clinical phenotyping and exploration of cross-species similarities were further performed on a family-by-family basis. Statistical validation was performed on 248 ancestrally similar control trios provided by the Genome of the Netherlands project and on 574 ancestrally matched controls provided by the French Exome Project. Variants of clinical interest were identified in 180 genes significantly associated with key pathways of forebrain development including sonic hedgehog (SHH) and primary cilia. Oligogenic events were observed in 10 families and involved both known and novel holoprosencephaly genes including recurrently mutated FAT1, NDST1, COL2A1 and SCUBE2. The incidence of oligogenic combinations was significantly higher in holoprosencephaly patients compared to two control populations (P < 10−9). We also show that depending on the affected genes, patients present with particular clinical features. This study reports novel disease genes and supports oligogenicity as clinically relevant model in holoprosencephaly. It also highlights key roles of SHH signalling and primary cilia in forebrain development. We hypothesize that distinction between different clinical manifestations of holoprosencephaly lies in the degree of overall functional impact on SHH signalling. Finally, we underline that integrating clinical phenotyping in genetic studies is a powerful tool to specify the clinical relevance of certain mutations.