Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation ...integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of “exclusion PRS PheWAS” was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.
Psoriasis is a complex disease of skin with a prevalence of about 2%. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for psoriasis to date, including data from eight ...different Caucasian cohorts, with a combined effective sample size >39,000 individuals. We identified 16 additional psoriasis susceptibility loci achieving genome-wide significance, increasing the number of identified loci to 63 for European-origin individuals. Functional analysis highlighted the roles of interferon signalling and the NFκB cascade, and we showed that the psoriasis signals are enriched in regulatory elements from different T cells (CD8
T-cells and CD4
T-cells including T
0, T
1 and T
17). The identified loci explain ∼28% of the genetic heritability and generate a discriminatory genetic risk score (AUC=0.76 in our sample) that is significantly correlated with age at onset (p=2 × 10
). This study provides a comprehensive layout for the genetic architecture of common variants for psoriasis.
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes ...encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from ...3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
Spontaneous coronary artery dissection (SCAD) is a non-atherosclerotic cause of myocardial infarction (MI), typically in young women. We undertook a genome-wide association study of SCAD (N
= 270/N
...= 5,263) and identified and replicated an association of rs12740679 at chromosome 1q21.2 (P
= 2.19 × 10
, OR = 1.8) influencing ADAMTSL4 expression. Meta-analysis of discovery and replication samples identified associations with P < 5 × 10
at chromosome 6p24.1 in PHACTR1, chromosome 12q13.3 in LRP1, and in females-only, at chromosome 21q22.11 near LINC00310. A polygenic risk score for SCAD was associated with (1) higher risk of SCAD in individuals with fibromuscular dysplasia (P = 0.021, OR = 1.82 95% CI: 1.09-3.02) and (2) lower risk of atherosclerotic coronary artery disease and MI in the UK Biobank (P = 1.28 × 10
, HR = 0.91 95% CI :0.89-0.93, for MI) and Million Veteran Program (P = 9.33 × 10
, OR = 0.95 95% CI: 0.94-0.96, for CAD; P = 3.35 × 10
, OR = 0.96 95% CI: 0.95-0.98 for MI). Here we report that SCAD-related MI and atherosclerotic MI exist at opposite ends of a genetic risk spectrum, inciting MI with disparate underlying vascular biology.
Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are ...underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project.
Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to ...diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.
The prevalence and severity of many diseases differs by sex, potentially due to sex-specific patterns in DNA methylation. Autosomal sex-specific differences in DNA methylation have been observed in ...cord blood and placental tissue but are not well studied in saliva or in diverse populations. We sought to characterize sex-specific DNA methylation on autosomal chromosomes in saliva samples from children in the Future of Families and Child Wellbeing Study, a multi-ethnic prospective birth cohort containing an oversampling of Black, Hispanic and low-income families. DNA methylation from saliva samples was analysed on 796 children (50.6% male) at both ages 9 and 15 with DNA methylation measured using the Illumina HumanMethylation 450k array. An epigenome-wide association analysis of the age 9 samples identified 8,430 sex-differentiated autosomal DNA methylation sites (P < 2.4 × 10
−7
), of which 76.2% had higher DNA methylation in female children. The strongest sex-difference was in the cg26921482 probe, in the AMDHD2 gene, with 30.6% higher DNA methylation in female compared to male children (P < 1 × 10
−300
). Treating the age 15 samples as an internal replication set, we observed highly consistent results between the ages 9 and 15 measurements, indicating stable and replicable sex-differentiation. Further, we directly compared our results to previously published DNA methylation sex differences in both cord blood and saliva and again found strong consistency. Our findings support widespread and robust sex-differential DNA methylation across age, human tissues, and populations. These findings help inform our understanding of potential biological processes contributing to sex differences in human physiology and disease.
Elevated circulating cystatin C is associated with cognitive impairment in non-Hispanic Whites, but its role in racial disparities in dementia is understudied. In a nationally representative sample ...of older non-Hispanic White, non-Hispanic Black, and Hispanic adults in the United States, we use mediation-interaction analysis to understand how racial disparities in the cystatin C physiological pathway may contribute to racial disparities in prevalent dementia.
In a pooled cross-sectional sample of the Health and Retirement Study (
= 9,923), we employed Poisson regression to estimate prevalence ratios and to test the relationship between elevated cystatin C (>1.24 vs. ≤1.24 mg/L) and impaired cognition, adjusted for demographics, behavioral risk factors, other biomarkers, and chronic conditions. Self-reported racialized social categories were a proxy measure for exposure to racism. We calculated additive interaction measures and conducted four-way mediation-interaction decomposition analysis to test the moderating effect of race/ethnicity and mediating effect of cystatin C on the racial disparity.
Overall, elevated cystatin C was associated with dementia (prevalence ratio PR = 1.2; 95% CI: 1.0, 1.5). Among non-Hispanic Black relative to non-Hispanic White participants, the relative excess risk due to interaction was 0.7 (95% CI: -0.1, 2.4), the attributable proportion was 0.1 (95% CI: -0.2, 0.4), and the synergy index was 1.1 (95% CI: 0.8, 1.8) in a fully adjusted model. Elevated cystatin C was estimated to account for 2% (95% CI: -0, 4%) for the racial disparity in prevalent dementia, and the interaction accounted for 8% (95% CI: -5, 22%). Analyses for Hispanic relative to non-white participants suggested moderation by race/ethnicity, but not mediation.
Elevated cystatin C was associated with dementia prevalence. Our mediation-interaction decomposition analysis suggested that the effect of elevated cystatin C on the racial disparity might be moderated by race/ethnicity, which indicates that the racialization process affects not only the distribution of circulating cystatin C across minoritized racial groups, but also the strength of association between the biomarker and dementia prevalence. These results provide evidence that cystatin C is associated with adverse brain health and this effect is larger than expected for individuals racialized as minorities had they been racialized and treated as non-Hispanic White.
Polygenic variation unrelated to disease contributes to interindividual variation in baseline white blood cell (WBC) counts, but its clinical significance is uncharacterized. We investigated the ...clinical consequences of a genetic predisposition toward lower WBC counts among 89,559 biobank participants from tertiary care centers using a polygenic score for WBC count (PGS
) comprising single nucleotide polymorphisms not associated with disease. A predisposition to lower WBC counts was associated with a decreased risk of identifying pathology on a bone marrow biopsy performed for a low WBC count (odds-ratio = 0.55 per standard deviation increase in PGS
95%CI, 0.30-0.94, p = 0.04), an increased risk of leukopenia (a low WBC count) when treated with a chemotherapeutic (n = 1724, hazard ratio HR = 0.78 0.69-0.88, p = 4.0 × 10
) or immunosuppressant (n = 354, HR = 0.61 0.38-0.99, p = 0.04). A predisposition to benign lower WBC counts was associated with an increased risk of discontinuing azathioprine treatment (n = 1,466, HR = 0.62 0.44-0.87, p = 0.006). Collectively, these findings suggest that there are genetically predisposed individuals who are susceptible to escalations or alterations in clinical care that may be harmful or of little benefit.