Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both ...recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.
Premature menopause is an independent risk factor for cardiovascular disease in women, but mechanisms underlying this association remain unclear. Clonal hematopoiesis of indeterminate potential ...(CHIP), the age-related expansion of hematopoietic cells with leukemogenic mutations without detectable malignancy, is associated with accelerated atherosclerosis. Whether premature menopause is associated with CHIP is unknown.
We included postmenopausal women from the UK Biobank (n=11 495) aged 40 to 70 years with whole exome sequences and from the Women's Health Initiative (n=8111) aged 50 to 79 years with whole genome sequences. Premature menopause was defined as natural or surgical menopause occurring before age 40 years. Co-primary outcomes were the presence of any CHIP and CHIP with variant allele frequency >0.1. Logistic regression tested the association of premature menopause with CHIP, adjusted for age, race, the first 10 principal components of ancestry, smoking, diabetes, and hormone therapy use. Secondary analyses considered natural versus surgical premature menopause and gene-specific CHIP subtypes. Multivariable-adjusted Cox models tested the association between CHIP and incident coronary artery disease.
The sample included 19 606 women, including 418 (2.1%) with natural premature menopause and 887 (4.5%) with surgical premature menopause. Across cohorts, CHIP prevalence in postmenopausal women with versus without a history of premature menopause was 8.8% versus 5.5% (
<0.001), respectively. After multivariable adjustment, premature menopause was independently associated with CHIP (all CHIP: odds ratio, 1.36 95% 1.10-1.68;
=0.004; CHIP with variant allele frequency >0.1: odds ratio, 1.40 95% CI, 1.10-1.79;
=0.007). Associations were larger for natural premature menopause (all CHIP: odds ratio, 1.73 95% CI, 1.23-2.44;
=0.001; CHIP with variant allele frequency >0.1: odds ratio, 1.91 95% CI, 1.30-2.80;
<0.001) but smaller and nonsignificant for surgical premature menopause. In gene-specific analyses, only
CHIP was significantly associated with premature menopause. Among postmenopausal middle-aged women, CHIP was independently associated with incident coronary artery disease (hazard ratio associated with all CHIP: 1.36 95% CI, 1.07-1.73;
=0.012; hazard ratio associated with CHIP with variant allele frequency >0.1: 1.48 95% CI, 1.13-1.94;
=0.005).
Premature menopause, especially natural premature menopause, is independently associated with CHIP among postmenopausal women. Natural premature menopause may serve as a risk signal for predilection to develop CHIP and CHIP-associated cardiovascular disease.
Epigenetic biomarkers of aging (the "epigenetic clock") have the potential to address puzzling findings surrounding mortality rates and incidence of cardio-metabolic disease such as: (1) women ...consistently exhibiting lower mortality than men despite having higher levels of morbidity; (2) racial/ethnic groups having different mortality rates even after adjusting for socioeconomic differences; (3) the black/white mortality cross-over effect in late adulthood; and (4) Hispanics in the United States having a longer life expectancy than Caucasians despite having a higher burden of traditional cardio-metabolic risk factors.
We analyzed blood, saliva, and brain samples from seven different racial/ethnic groups. We assessed the intrinsic epigenetic age acceleration of blood (independent of blood cell counts) and the extrinsic epigenetic aging rates of blood (dependent on blood cell counts and tracks the age of the immune system). In blood, Hispanics and Tsimane Amerindians have lower intrinsic but higher extrinsic epigenetic aging rates than Caucasians. African-Americans have lower extrinsic epigenetic aging rates than Caucasians and Hispanics but no differences were found for the intrinsic measure. Men have higher epigenetic aging rates than women in blood, saliva, and brain tissue.
Epigenetic aging rates are significantly associated with sex, race/ethnicity, and to a lesser extent with CHD risk factors, but not with incident CHD outcomes. These results may help elucidate lower than expected mortality rates observed in Hispanics, older African-Americans, and women.
Existing studies of chromatin conformation have primarily focused on potential enhancers interacting with gene promoters. By contrast, the interactivity of promoters per se, while equally critical to ...understanding transcriptional control, has been largely unexplored, particularly in a cell type-specific manner for blood lineage cell types. In this study, we leverage promoter capture Hi-C data across a compendium of blood lineage cell types to identify and characterize cell type-specific super-interactive promoters (SIPs). Notably, promoter-interacting regions (PIRs) of SIPs are more likely to overlap with cell type-specific ATAC-seq peaks and GWAS variants for relevant blood cell traits than PIRs of non-SIPs. Moreover, PIRs of cell-type-specific SIPs show enriched heritability of relevant blood cell trait (s), and are more enriched with GWAS variants associated with blood cell traits compared to PIRs of non-SIPs. Further, SIP genes tend to express at a higher level in the corresponding cell type. Importantly, SIP subnetworks incorporating cell-type-specific SIPs and ATAC-seq peaks help interpret GWAS variants. Examples include GWAS variants associated with platelet count near the megakaryocyte SIP gene EPHB3 and variants associated lymphocyte count near the native CD4 T-Cell SIP gene ETS1. Interestingly, around 25.7% ~ 39.6% blood cell traits GWAS variants residing in SIP PIR regions disrupt transcription factor binding motifs. Importantly, our analysis shows the potential of using promoter-centric analyses of chromatin spatial organization data to identify biologically important genes and their regulatory regions.
Polygenic risk scores (PRSs) are weighted sums of risk allele counts of single‐nucleotide polymorphisms (SNPs) associated with a disease or trait. PRSs are typically constructed based on published ...results from Genome‐Wide Association Studies (GWASs), and the majority of which has been performed in large populations of European ancestry (EA) individuals. Although many genotype‐trait associations have generalized across populations, the optimal choice of SNPs and weights for PRSs may differ between populations due to different linkage disequilibrium (LD) and allele frequency patterns. We compare various approaches for PRS construction, using GWAS results from both large EA studies and a smaller study in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL,
n
=
12
,
803). We consider multiple approaches for selecting SNPs and for computing SNP weights. We study the performance of the resulting PRSs in an independent study of Hispanics/Latinos from the Women’s Health Initiative (WHI,
n
=
3
,
582). We support our investigation with simulation studies of potential genetic architectures in a single locus. We observed that selecting variants based on EA GWASs generally performs well, except for blood pressure trait. However, the use of EA GWASs for weight estimation was suboptimal. Using non‐EA GWAS results to estimate weights improved results.
Clonal hematopoiesis of indeterminate potential (CHIP) is a novel age-related risk factor for cardiovascular disease-related morbidity and mortality. The association of CHIP with risk of incident ...ischemic stroke was reported previously in an exploratory analysis including a small number of incident stroke cases without replication and lack of stroke subphenotyping. The purpose of this study was to discover whether CHIP is a risk factor for ischemic or hemorrhagic stroke.
We utilized plasma genome sequence data of blood DNA to identify CHIP in 78 752 individuals from 8 prospective cohorts and biobanks. We then assessed the association of CHIP and commonly mutated individual CHIP driver genes (
,
, and
) with any stroke, ischemic stroke, and hemorrhagic stroke.
CHIP was associated with an increased risk of total stroke (hazard ratio, 1.14 95% CI, 1.03-1.27;
=0.01) after adjustment for age, sex, and race. We observed associations with CHIP with risk of hemorrhagic stroke (hazard ratio, 1.24 95% CI, 1.01-1.51;
=0.04) and with small vessel ischemic stroke subtypes. In gene-specific association results,
showed the strongest association with total stroke and ischemic stroke, whereas
and
were each associated with increased risk of hemorrhagic stroke.
CHIP is associated with an increased risk of stroke, particularly with hemorrhagic and small vessel ischemic stroke. Future studies clarifying the relationship between CHIP and subtypes of stroke are needed.
Populations of non‐European ancestry are substantially underrepresented in genome‐wide association studies (GWAS). As genetic effects can differ between ancestries due to possibly different causal ...variants or linkage disequilibrium patterns, a meta‐analysis that includes GWAS of all populations yields biased estimation in each of the populations and the bias disproportionately impacts non‐European ancestry populations. This is because meta‐analysis combines study‐specific estimates with inverse variance as the weights, which causes biases towards studies with the largest sample size, typical of the European ancestry population. In this paper, we propose two empirical Bayes (EB) estimators to borrow the strength of information across populations although accounting for between‐population heterogeneity. Extensive simulation studies show that the proposed EB estimators are largely unbiased and improve efficiency compared to the population‐specific estimator. In contrast, even though the meta‐analysis estimator has a much smaller variance, it yields significant bias when the genetic effect is heterogeneous across populations. We apply the proposed EB estimators to a large‐scale trans‐ancestry GWAS of stroke and demonstrate that the EB estimators reduce the variance of the population‐specific estimator substantially, with the effect estimates close to the population‐specific estimates.
Background Presence of clonal hematopoiesis of indeterminate potential (CHIP) is associated with a higher risk of atherosclerotic cardiovascular disease, cancer, and mortality. The relationship ...between a healthy lifestyle and CHIP is unknown. Methods and Results This analysis included 8709 postmenopausal women (mean age, 66.5 years) enrolled in the WHI (Women's Health Initiative), free of cancer or cardiovascular disease, with deep-coverage whole genome sequencing data available. Information on lifestyle factors (body mass index, smoking, physical activity, and diet quality) was obtained, and a healthy lifestyle score was created on the basis of healthy criteria met (0 point least healthy to 4 points most healthy). CHIP was derived on the basis of a prespecified list of leukemogenic driver mutations. The prevalence of CHIP was 8.6%. A higher healthy lifestyle score was not associated with CHIP (multivariable-adjusted odds ratio OR 95% CI, 0.99 0.80-1.23 and 1.13 0.93-1.37) for the upper (3 or 4 points) and middle category (2 points), respectively, versus referent (0 or 1 point). Across score components, a normal and overweight body mass index compared with obese was significantly associated with a lower odds for CHIP (OR, 0.71 95% CI, 0.57-0.88 and 0.83 95% CI, 0.68-1.01, respectively;
-trend 0.0015). Having never smoked compared with being a current smoker tended to be associated with lower odds for CHIP. Conclusions A healthy lifestyle, based on a composite score, was not related to CHIP among postmenopausal women. However, across individual lifestyle factors, having a normal body mass index was strongly associated with a lower prevalence of CHIP. These findings support the idea that certain healthy lifestyle factors are associated with a lower frequency of CHIP.
The identification of rare coding or splice site variants remains the most straightforward strategy to link genes with human phenotypes. Here, we analyzed the association between 137,086 rare (minor ...allele frequency (MAF) <1%) coding or splice site variants and 15 hematological traits in up to 308,572 participants. We found 56 such rare coding or splice site variants at P<5x10-8, including 31 that are associated with a blood-cell phenotype for the first time. All but one of these 31 new independent variants map to loci previously implicated in hematopoiesis by genome-wide association studies (GWAS). This includes a rare splice acceptor variant (rs146597587, MAF = 0.5%) in interleukin 33 (IL33) associated with reduced eosinophil count (P = 2.4x10-23), and lower risk of asthma (P = 2.6x10-7, odds ratio 95% confidence interval = 0.56 0.45-0.70) and allergic rhinitis (P = 4.2x10-4, odds ratio = 0.55 0.39-0.76). The single new locus identified in our study is defined by a rare p.Arg172Gly missense variant (rs145535174, MAF = 0.05%) in plasminogen (PLG) associated with increased platelet count (P = 6.8x10-9), and decreased D-dimer concentration (P = 0.018) and platelet reactivity (P<0.03). Finally, our results indicate that searching for rare coding or splice site variants in very large sample sizes can help prioritize causal genes at many GWAS loci associated with complex human diseases and traits.
Leukocyte telomere length (LTL), which reflects telomere length in other somatic tissues, is a complex genetic trait. Eleven SNPs have been shown in genome-wide association studies to be associated ...with LTL at a genome-wide level of significance within cohorts of European ancestry. It has been observed that LTL is longer in African Americans than in Europeans. The underlying reason for this difference is unknown. Here we show that LTL is significantly longer in sub-Saharan Africans than in both Europeans and African Americans. Based on the 11 LTL-associated alleles and genetic data in phase 3 of the 1000 Genomes Project, we show that the shifts in allele frequency within Europe and between Europe and Africa do not fit the pattern expected by neutral genetic drift. Our findings suggest that differences in LTL within Europeans and between Europeans and Africans is influenced by polygenic adaptation and that differences in LTL between Europeans and Africans might explain, in part, ethnic differences in risks for human diseases that have been linked to LTL.