Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple ...regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R
by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models ...and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.
Obesity is a worldwide epidemic, with major health and economic costs. Here we estimate heritability for body mass index (BMI) in 172,000 sibling pairs and 150,832 unrelated individuals and explore ...the contribution of genotype-covariate interaction effects at common SNP loci. We find evidence for genotype-age interaction (likelihood ratio test (LRT) = 73.58, degrees of freedom (df) = 1, P = 4.83 × 10
), which contributed 8.1% (1.4% s.e.) to BMI variation. Across eight self-reported lifestyle factors, including diet and exercise, we find genotype-environment interaction only for smoking behavior (LRT = 19.70, P = 5.03 × 10
and LRT = 30.80, P = 1.42 × 10
), which contributed 4.0% (0.8% s.e.) to BMI variation. Bayesian association analysis suggests that BMI is highly polygenic, with 75% of the SNP heritability attributable to loci that each explain <0.01% of the phenotypic variance. Our findings imply that substantially larger sample sizes across ages and lifestyles are required to understand the full genetic architecture of BMI.
We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium ...(Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.
Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K.
Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.
Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a ...successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.
Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls.
For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy.All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time.
The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended.
The aim of the study was to evaluate the predictive value regarding postoperative hearing benefit of electrically evoked auditory brainstem response audiometry in sporadic vestibular schwannoma ...patients undergoing simultaneous tumor resection and cochlear implantation.
Patients were included in a prospective study conducted between October 2016 and January 2019.
The study was conducted at a tertiary care center.
Subjects with unilateral sporadic vestibular schwannoma were screened for study participation. Patients underwent translabyrinthine vestibular schwannoma resection and cochlear implantation simultaneously.
Electrically evoked brainstem response audiometry was performed during surgery before and after tumor removal using an intracochlear test electrode to objectively evaluate nerve conduction.
Electrically evoked brainstem response audiometry results were correlated with postoperative sound field audiometry, word recognition tests, and speech reception thresholds. Quality of life was assessed before and 12 months after translabyrinthine tumor removal and cochlear implantation.
Five patients, three male and two female, were included in the study and followed for at least 1 year after implantation. Three of the five patients are daily cochlear implant users with open set speech recognition. Two individuals with negative intraoperative electrically evoked auditory brainstem response results showed no auditory perception with cochlear implant.
Simultaneous translabyrinthine vestibular schwannoma resection and cochlear implantation with intraoperative electrically evoked auditory brainstem response measurements is a feasible and promising option for sporadic vestibular schwannoma patients. Preservation of electrically evoked auditory brainstem responses seems to predict good subsequent hearing outcomes.
We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to ...estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.
Genetic association studies in admixed populations are underrepresented in the genomics literature, with a key concern for researchers being the adequate control of spurious associations due to ...population structure. Linear mixed models (LMMs) are well suited for genome-wide association studies (GWAS) because they account for both population stratification and cryptic relatedness and achieve increased statistical power by jointly modeling all genotyped markers. Additionally, Bayesian LMMs allow for more flexible assumptions about the underlying distribution of genetic effects, and can concurrently estimate the proportion of phenotypic variance explained by genetic markers. Using three recently published Bayesian LMMs, Bayes R, BSLMM, and BOLT-LMM, we investigate an existing data set on eye (
= 625) and skin (
= 684) color from Cape Verde, an island nation off West Africa that is home to individuals with a broad range of phenotypic values for eye and skin color due to the mix of West African and European ancestry. We use simulations to demonstrate the utility of Bayesian LMMs for mapping loci and studying the genetic architecture of quantitative traits in admixed populations. The Bayesian LMMs provide evidence for two new pigmentation loci: one for eye color (
) and one for skin color (
).
BACKGROUND: At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires ...and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). METHODS: Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. RESULTS: RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. CONCLUSIONS: Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP.
Despite modern treatment approaches, survival of patients with recurrent or metastatic squamous cell carcinoma of the head and neck (SCCHN) remains low and it is difficult to identify patients who ...derive optimal benefit from treatment. We therefore analyzed which commonly available laboratory and clinical parameters may help improve the prognostication in this patient group. This retrospective monocenter analysis includes 128 patients with recurrent or metastatic SCCHN treated with cetuximab alone or in combination with polychemotherapy as first line therapy. Factors with independent prognostic power in the multivariate analysis were used to build up a score separating patient groups with different survival. Patients had a median age of 61 years and 103 patients were treated with polychemotherapy plus cetuximab. An ECOG score above 1, high CRP and leukocyte levels, less intensive treatment and a time below 12 months from primary diagnosis to relapse remained as independent negative prognostic factors in multivariate analysis. Patients with 0 to 1 risk factors had a median OS of 13.6 months compared to a median OS of less than one month for patients 4 to 5 risk factors (p<0.001). This study identifies 5 clinical and serum values that influence survival of patients with recurrent or metastatic SCCHN treated with cetuximab. By combining these factors to create a score for OS, it is possible to distinguish a group of patients with significantly improved survival and define those most likely to have no benefit from cetuximab treatment.