Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple ...regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R
by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
The Estonian Biobank cohort is a volunteer-based sample of the Estonian resident adult population (aged ≥18 years). The current number of participants-close to 52000--represents a large proportion, ...5%, of the Estonian adult population, making it ideally suited to population-based studies. General practitioners (GPs) and medical personnel in the special recruitment offices have recruited participants throughout the country. At baseline, the GPs performed a standardized health examination of the participants, who also donated blood samples for DNA, white blood cells and plasma tests and filled out a 16-module questionnaire on health-related topics such as lifestyle, diet and clinical diagnoses described in WHO ICD-10. A significant part of the cohort has whole genome sequencing (100), genome-wide single nucleotide polymorphism (SNP) array data (20 000) and/or NMR metabolome data (11 000) available (http://www.geenivaramu.ee/for-scientists/data-release/). The data are continuously updated through periodical linking to national electronic databases and registries. A part of the cohort has been re-contacted for follow-up purposes and resampling, and targeted invitations are possible for specific purposes, for example people with a specific diagnosis. The Estonian Genome Center of the University of Tartu is actively collaborating with many universities, research institutes and consortia and encourages fellow scientists worldwide to co-initiate new academic or industrial joint projects with us.
Genome-wide association studies have identified numerous loci linked with complex diseases, for which the molecular mechanisms remain largely unclear. Comprehensive molecular profiling of circulating ...metabolites captures highly heritable traits, which can help to uncover metabolic pathophysiology underlying established disease variants. We conduct an extended genome-wide association study of genetic influences on 123 circulating metabolic traits quantified by nuclear magnetic resonance metabolomics from up to 24,925 individuals and identify eight novel loci for amino acids, pyruvate and fatty acids. The LPA locus link with cardiovascular risk exemplifies how detailed metabolic profiling may inform underlying aetiology via extensive associations with very-low-density lipoprotein and triglyceride metabolism. Genetic fine mapping and Mendelian randomization uncover wide-spread causal effects of lipoprotein(a) on overall lipoprotein metabolism and we assess potential pleiotropic consequences of genetically elevated lipoprotein(a) on diverse morbidities via electronic health-care records. Our findings strengthen the argument for safe LPA-targeted intervention to reduce cardiovascular risk.
A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological data sets to provide insight into disease ...pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA). Here we performed a genome-wide association study meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ∼10 million single-nucleotide polymorphisms. We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 101 (refs 2 - 4). We devised an in silico pipeline using established bioinformatics methods based on functional annotation, cis-acting expression quantitative trait loci and pathway analyses--as well as novel methods based on genetic overlap with human primary immunodeficiency, haematological cancer somatic mutations and knockout mouse phenotypes--to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
Psoriasis is a complex disease of skin with a prevalence of about 2%. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for psoriasis to date, including data from eight ...different Caucasian cohorts, with a combined effective sample size >39,000 individuals. We identified 16 additional psoriasis susceptibility loci achieving genome-wide significance, increasing the number of identified loci to 63 for European-origin individuals. Functional analysis highlighted the roles of interferon signalling and the NFκB cascade, and we showed that the psoriasis signals are enriched in regulatory elements from different T cells (CD8
T-cells and CD4
T-cells including T
0, T
1 and T
17). The identified loci explain ∼28% of the genetic heritability and generate a discriminatory genetic risk score (AUC=0.76 in our sample) that is significantly correlated with age at onset (p=2 × 10
). This study provides a comprehensive layout for the genetic architecture of common variants for psoriasis.
Lifespan is a trait of enormous personal interest. Research into the biological basis of human lifespan, however, is hampered by the long time to death. Using a novel approach of regressing (272,081) ...parental lifespans beyond age 40 years on participant genotype in a new large data set (UK Biobank), we here show that common variants near the apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 genes are associated with lifespan. The effects are strongly sex and age dependent, with APOE ɛ4 differentially influencing maternal lifespan (P=4.2 × 10(-15), effect -1.24 years of maternal life per imputed risk allele in parent; sex difference, P=0.011), and a locus near CHRNA3/5 differentially affecting paternal lifespan (P=4.8 × 10(-11), effect -0.86 years per allele; sex difference P=0.075). Rare homozygous carriers of the risk alleles at both loci are predicted to have 3.3-3.7 years shorter lives.
Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed ...peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Heritable variance in psychological traits may reflect genetic and biological processes that are not necessarily specific to these particular traits but pertain to a broader range of phenotypes. We ...tested the possibility that the personality domains of the five-factor model and their 30 facets, as rated by people themselves and their knowledgeable informants, reflect polygenic influences that have been previously associated with educational attainment. In a sample of more than 3,000 adult Estonians, education polygenic scores (EPSs), which are interpretable as estimates of molecular-genetic propensity for education, were correlated with various personality traits, particularly from the neuroticism and openness domains. The correlations of personality traits with phenotypic educational attainment closely mirrored their correlations with EPS. Moreover, EPS predicted an aggregate personality trait tailored to capture the maximum amount of variance in educational attainment almost as strongly as it predicted the attainment itself. We discuss possible interpretations and implications of these findings.
Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict ...genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.
Early identification of ambulatory persons at high short-term risk of death could benefit targeted prevention. To identify biomarkers for all-cause mortality and enhance risk prediction, we conducted ...high-throughput profiling of blood specimens in two large population-based cohorts.
106 candidate biomarkers were quantified by nuclear magnetic resonance spectroscopy of non-fasting plasma samples from a random subset of the Estonian Biobank (n = 9,842; age range 18-103 y; 508 deaths during a median of 5.4 y of follow-up). Biomarkers for all-cause mortality were examined using stepwise proportional hazards models. Significant biomarkers were validated and incremental predictive utility assessed in a population-based cohort from Finland (n = 7,503; 176 deaths during 5 y of follow-up). Four circulating biomarkers predicted the risk of all-cause mortality among participants from the Estonian Biobank after adjusting for conventional risk factors: alpha-1-acid glycoprotein (hazard ratio HR 1.67 per 1-standard deviation increment, 95% CI 1.53-1.82, p = 5×10⁻³¹), albumin (HR 0.70, 95% CI 0.65-0.76, p = 2×10⁻¹⁸), very-low-density lipoprotein particle size (HR 0.69, 95% CI 0.62-0.77, p = 3×10⁻¹²), and citrate (HR 1.33, 95% CI 1.21-1.45, p = 5×10⁻¹⁰). All four biomarkers were predictive of cardiovascular mortality, as well as death from cancer and other nonvascular diseases. One in five participants in the Estonian Biobank cohort with a biomarker summary score within the highest percentile died during the first year of follow-up, indicating prominent systemic reflections of frailty. The biomarker associations all replicated in the Finnish validation cohort. Including the four biomarkers in a risk prediction score improved risk assessment for 5-y mortality (increase in C-statistics 0.031, p = 0.01; continuous reclassification improvement 26.3%, p = 0.001).
Biomarker associations with cardiovascular, nonvascular, and cancer mortality suggest novel systemic connectivities across seemingly disparate morbidities. The biomarker profiling improved prediction of the short-term risk of death from all causes above established risk factors. Further investigations are needed to clarify the biological mechanisms and the utility of these biomarkers for guiding screening and prevention.