Genetic association results are often interpreted with the assumption that study participation does not affect downstream analyses. Understanding the genetic basis of participation bias is ...challenging since it requires the genotypes of unseen individuals. Here we demonstrate that it is possible to estimate comparative biases by performing a genome-wide association study contrasting one subgroup versus another. For example, we showed that sex exhibits artifactual autosomal heritability in the presence of sex-differential participation bias. By performing a genome-wide association study of sex in approximately 3.3 million males and females, we identified over 158 autosomal loci spuriously associated with sex and highlighted complex traits underpinning differences in study participation between the sexes. For example, the body mass index-increasing allele at FTO was observed at higher frequency in males compared to females (odds ratio = 1.02, P = 4.4 × 10
). Finally, we demonstrated how these biases can potentially lead to incorrect inferences in downstream analyses and propose a conceptual framework for addressing such biases. Our findings highlight a new challenge that genetic studies may face as sample sizes continue to grow.
More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance-the probability that a carrier of the purported disease-causing genotype will indeed develop ...the disease-is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.
Hamer
argue that the variable "ever versus never had a same-sex partner" does not capture the complexity of human sexuality. We agree and said so in our paper. But Hamer
neglect to mention that we ...also reported follow-up analyses showing substantial overlap of the genetic influences on our main variable and on more nuanced measures of sexual behavior, attraction, and identity.
Uterine leiomyomata (UL) are the most common neoplasms of the female reproductive tract and primary cause for hysterectomy, leading to considerable morbidity and high economic burden. Here we conduct ...a GWAS meta-analysis in 35,474 cases and 267,505 female controls of European ancestry, identifying eight novel genome-wide significant (P < 5 × 10
) loci, in addition to confirming 21 previously reported loci, including multiple independent signals at 10 loci. Phenotypic stratification of UL by heavy menstrual bleeding in 3409 cases and 199,171 female controls reveals genome-wide significant associations at three of the 29 UL loci: 5p15.33 (TERT), 5q35.2 (FGFR4) and 11q22.3 (ATM). Four loci identified in the meta-analysis are also associated with endometriosis risk; an epidemiological meta-analysis across 402,868 women suggests at least a doubling of risk for UL diagnosis among those with a history of endometriosis. These findings increase our understanding of genetic contribution and biology underlying UL development, and suggest overlapping genetic origins with endometriosis.
The "antidepressant efficacy" survey (AES) was deployed to > 50,000 23andMe, Inc. research participants to investigate the genetic basis of treatment-resistant depression (TRD) and ...non-treatment-resistant depression (NTRD). Genome-wide association studies (GWAS) were performed, including TRD vs. NTRD, selective serotonin reuptake inhibitor (SSRI) responders vs. non-responders, serotonin-norepinephrine reuptake inhibitor (SNRI) responders vs. non-responders, and norepinephrine-dopamine reuptake inhibitor responders vs. non-responders. Only the SSRI association reached the genome-wide significance threshold (p < 5 × 10
): one genomic region in RNF219-AS1 (SNP rs4884091, p = 2.42 × 10
, OR = 1.21); this association was also observed in the meta-analysis (13,130 responders vs. 6,610 non-responders) of AES and an earlier "antidepressant efficacy and side effects" survey (AESES) cohort. Meta-analysis for SNRI response phenotype derived from AES and AESES (4030 responders vs. 3049 non-responders) identified another genomic region (lead SNP rs4955665, p = 1.62 × 10
, OR = 1.25) in an intronic region of MECOM passing the genome-wide significance threshold. Meta-analysis for the TRD phenotype (31,068 NTRD vs 5,714 TRD) identified one additional genomic region (lead SNP rs150245813, p = 8.07 × 10
, OR = 0.80) in 10p11.1 passing the genome-wide significance threshold. A stronger association for rs150245813 was observed in current study (p = 7.35 × 10
, OR = 0.79) than the previous study (p = 1.40 × 10
, OR = 0.81), and for rs4955665, a stronger association in previous study (p = 1.21 × 10
, OR = 1.27) than the current study (p = 2.64 × 10
, OR = 1.21). In total, three novel loci associated with SSRI or SNRI (responders vs. non-responders), and NTRD vs TRD were identified; gene level association and gene set enrichment analyses implicate enrichment of genes involved in immune process.
Gastroesophageal reflux disease (GERD) is caused by gastric acid entering the esophagus. GERD has high prevalence and is the major risk factor for Barrett's esophagus (BE) and esophageal ...adenocarcinoma (EA). We conduct a large GERD GWAS meta-analysis (80,265 cases, 305,011 controls), identifying 25 independent genome-wide significant loci for GERD. Several of the implicated genes are existing or putative drug targets. Loci discovery is greatest with a broad GERD definition (including cases defined by self-report or medication data). Further, 91% of the GERD risk-increasing alleles also increase BE and/or EA risk, greatly expanding gene discovery for these traits. Our results map genes for GERD and related traits and uncover potential new drug targets for these conditions.
The benefits of large-scale genetic studies for healthcare of the populations studied are well documented, but these genetic studies have traditionally ignored people from some parts of the world, ...such as South Asia. Here we describe whole genome sequence (WGS) data from 4806 individuals recruited from the healthcare delivery systems of Pakistan, India and Bangladesh, combined with WGS from 927 individuals from isolated South Asian populations. We characterize population structure in South Asia and describe a genotyping array (SARGAM) and imputation reference panel that are optimized for South Asian genomes. We find evidence for high rates of reproductive isolation, endogamy and consanguinity that vary across the subcontinent and that lead to levels of rare homozygotes that reach 100 times that seen in outbred populations. Founder effects increase the power to associate functional variants with disease processes and make South Asia a uniquely powerful place for population-scale genetic studies.
Moving in synchrony to the beat is a fundamental component of musicality. Here we conducted a genome-wide association study to identify common genetic variants associated with beat synchronization in ...606,825 individuals. Beat synchronization exhibited a highly polygenic architecture, with 69 loci reaching genome-wide significance (P < 5 × 10
) and single-nucleotide-polymorphism-based heritability (on the liability scale) of 13%-16%. Heritability was enriched for genes expressed in brain tissues and for fetal and adult brain-specific gene regulatory elements, underscoring the role of central-nervous-system-expressed genes linked to the genetic basis of the trait. We performed validations of the self-report phenotype (through separate experiments) and of the genome-wide association study (polygenic scores for beat synchronization were associated with patients algorithmically classified as musicians in medical records of a separate biobank). Genetic correlations with breathing function, motor function, processing speed and chronotype suggest shared genetic architecture with beat synchronization and provide avenues for new phenotypic and genetic explorations.
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait ...heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R
= 0.144; highest R
= 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R
to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Disease risk scores for skin cancers Fontanillas, Pierre; Alipanahi, Babak; Furlotte, Nicholas A ...
Nature communications,
01/2021, Volume:
12, Issue:
1
Journal Article
Peer reviewed
Open access
We trained and validated risk prediction models for the three major types of skin cancer- basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma-on a cross-sectional and longitudinal ...dataset of 210,000 consented research participants who responded to an online survey covering personal and family history of skin cancer, skin susceptibility, and UV exposure. We developed a primary disease risk score (DRS) that combined all 32 identified genetic and non-genetic risk factors. Top percentile DRS was associated with an up to 13-fold increase (odds ratio per standard deviation increase >2.5) in the risk of developing skin cancer relative to the middle DRS percentile. To derive lifetime risk trajectories for the three skin cancers, we developed a second and age independent disease score, called DRSA. Using incident cases, we demonstrated that DRSA could be used in early detection programs for identifying high risk asymptotic individuals, and predicting when they are likely to develop skin cancer. High DRSA scores were not only associated with earlier disease diagnosis (by up to 14 years), but also with more severe and recurrent forms of skin cancer.