Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple ...regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R
by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
The Estonian Biobank cohort is a volunteer-based sample of the Estonian resident adult population (aged ≥18 years). The current number of participants-close to 52000--represents a large proportion, ...5%, of the Estonian adult population, making it ideally suited to population-based studies. General practitioners (GPs) and medical personnel in the special recruitment offices have recruited participants throughout the country. At baseline, the GPs performed a standardized health examination of the participants, who also donated blood samples for DNA, white blood cells and plasma tests and filled out a 16-module questionnaire on health-related topics such as lifestyle, diet and clinical diagnoses described in WHO ICD-10. A significant part of the cohort has whole genome sequencing (100), genome-wide single nucleotide polymorphism (SNP) array data (20 000) and/or NMR metabolome data (11 000) available (http://www.geenivaramu.ee/for-scientists/data-release/). The data are continuously updated through periodical linking to national electronic databases and registries. A part of the cohort has been re-contacted for follow-up purposes and resampling, and targeted invitations are possible for specific purposes, for example people with a specific diagnosis. The Estonian Genome Center of the University of Tartu is actively collaborating with many universities, research institutes and consortia and encourages fellow scientists worldwide to co-initiate new academic or industrial joint projects with us.
Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed ...peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Heritable variance in psychological traits may reflect genetic and biological processes that are not necessarily specific to these particular traits but pertain to a broader range of phenotypes. We ...tested the possibility that the personality domains of the five-factor model and their 30 facets, as rated by people themselves and their knowledgeable informants, reflect polygenic influences that have been previously associated with educational attainment. In a sample of more than 3,000 adult Estonians, education polygenic scores (EPSs), which are interpretable as estimates of molecular-genetic propensity for education, were correlated with various personality traits, particularly from the neuroticism and openness domains. The correlations of personality traits with phenotypic educational attainment closely mirrored their correlations with EPS. Moreover, EPS predicted an aggregate personality trait tailored to capture the maximum amount of variance in educational attainment almost as strongly as it predicted the attainment itself. We discuss possible interpretations and implications of these findings.
Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict ...genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.
Pernicious anemia is a rare condition characterized by vitamin B12 deficiency anemia due to lack of intrinsic factor, often caused by autoimmune gastritis. Patients with pernicious anemia have a ...higher incidence of other autoimmune disorders, such as type 1 diabetes, vitiligo, and autoimmune thyroid issues. Therefore, the disease has a clear autoimmune basis, although the genetic susceptibility factors have thus far remained poorly studied. We conduct a genome-wide association study meta-analysis in 2166 cases and 659,516 European controls from population-based biobanks and identify genome-wide significant signals in or near the PTPN22 (rs6679677, p = 1.91 × 10
, OR = 1.63), PNPT1 (rs12616502, p = 3.14 × 10
, OR = 1.70), HLA-DQB1 (rs28414666, p = 1.40 × 10
, OR = 1.38), IL2RA (rs2476491, p = 1.90 × 10
, OR = 1.22) and AIRE (rs74203920, p = 2.33 × 10
, OR = 1.83) genes, thus providing robust associations between pernicious anemia and genetic risk factors.
Lifespan is a trait of enormous personal interest. Research into the biological basis of human lifespan, however, is hampered by the long time to death. Using a novel approach of regressing (272,081) ...parental lifespans beyond age 40 years on participant genotype in a new large data set (UK Biobank), we here show that common variants near the apolipoprotein E and nicotinic acetylcholine receptor subunit alpha 5 genes are associated with lifespan. The effects are strongly sex and age dependent, with APOE ɛ4 differentially influencing maternal lifespan (P=4.2 × 10(-15), effect -1.24 years of maternal life per imputed risk allele in parent; sex difference, P=0.011), and a locus near CHRNA3/5 differentially affecting paternal lifespan (P=4.8 × 10(-11), effect -0.86 years per allele; sex difference P=0.075). Rare homozygous carriers of the risk alleles at both loci are predicted to have 3.3-3.7 years shorter lives.
Early identification of ambulatory persons at high short-term risk of death could benefit targeted prevention. To identify biomarkers for all-cause mortality and enhance risk prediction, we conducted ...high-throughput profiling of blood specimens in two large population-based cohorts.
106 candidate biomarkers were quantified by nuclear magnetic resonance spectroscopy of non-fasting plasma samples from a random subset of the Estonian Biobank (n = 9,842; age range 18-103 y; 508 deaths during a median of 5.4 y of follow-up). Biomarkers for all-cause mortality were examined using stepwise proportional hazards models. Significant biomarkers were validated and incremental predictive utility assessed in a population-based cohort from Finland (n = 7,503; 176 deaths during 5 y of follow-up). Four circulating biomarkers predicted the risk of all-cause mortality among participants from the Estonian Biobank after adjusting for conventional risk factors: alpha-1-acid glycoprotein (hazard ratio HR 1.67 per 1-standard deviation increment, 95% CI 1.53-1.82, p = 5×10⁻³¹), albumin (HR 0.70, 95% CI 0.65-0.76, p = 2×10⁻¹⁸), very-low-density lipoprotein particle size (HR 0.69, 95% CI 0.62-0.77, p = 3×10⁻¹²), and citrate (HR 1.33, 95% CI 1.21-1.45, p = 5×10⁻¹⁰). All four biomarkers were predictive of cardiovascular mortality, as well as death from cancer and other nonvascular diseases. One in five participants in the Estonian Biobank cohort with a biomarker summary score within the highest percentile died during the first year of follow-up, indicating prominent systemic reflections of frailty. The biomarker associations all replicated in the Finnish validation cohort. Including the four biomarkers in a risk prediction score improved risk assessment for 5-y mortality (increase in C-statistics 0.031, p = 0.01; continuous reclassification improvement 26.3%, p = 0.001).
Biomarker associations with cardiovascular, nonvascular, and cancer mortality suggest novel systemic connectivities across seemingly disparate morbidities. The biomarker profiling improved prediction of the short-term risk of death from all causes above established risk factors. Further investigations are needed to clarify the biological mechanisms and the utility of these biomarkers for guiding screening and prevention.
Type 2 diabetes (T2D) is a very common disease in humans. Here we conduct a meta-analysis of genome-wide association studies (GWAS) with ~16 million genetic variants in 62,892 T2D cases and 596,424 ...controls of European ancestry. We identify 139 common and 4 rare variants associated with T2D, 42 of which (39 common and 3 rare variants) are independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2765) with the GWAS results identifies 33 putative functional genes for T2D, 3 of which were targeted by approved drugs. A further integration of DNA methylation (n = 1980) and epigenomic annotation data highlight 3 genes (CAMK1D, TP53INP1, and ATP5G1) with plausible regulatory mechanisms, whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. Our study uncovers additional loci, proposes putative genetic regulatory mechanisms for T2D, and provides evidence of purifying selection for T2D-associated variants.
Genotype-first approach allows to systematically identify carriers of pathogenic variants in BRCA1/2 genes conferring a high risk of familial breast and ovarian cancer. Participants of the Estonian ...biobank have expressed support for the disclosure of clinically significant findings. With an Estonian biobank cohort, we applied a genotype-first approach, contacted carriers, and offered return of results with genetic counseling. We evaluated participants' responses to and the clinical utility of the reporting of actionable genetic findings. Twenty-two of 40 contacted carriers of 17 pathogenic BRCA1/2 variants responded and chose to receive results. Eight of these 22 participants qualified for high-risk assessment based on National Comprehensive Cancer Network criteria. Twenty of 21 counseled participants appreciated being contacted. Relatives of 10 participants underwent cascade screening. Five of 16 eligible female BRCA1/2 variant carriers chose to undergo risk-reducing surgery, and 10 adhered to surveillance recommendations over the 30-month follow-up period. We recommend the return of results to population-based biobank participants; this approach could be viewed as a model for population-wide genetic testing. The genotype-first approach permits the identification of individuals at high risk who would not be identified by application of an approach based on personal and family histories only.