Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the ...failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM’s constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
Rotation curves constrain a galaxy's underlying mass density profile, under the assumption that the observed rotation produces a centripetal force that exactly balances the inward force of gravity. ...However, most rotation curves are measured using emission lines from gas, which can experience additional forces due to pressure. In realistic galaxy disks, the gas pressure declines with radius, providing additional radial support to the disk. The measured tangential rotation speed will therefore tend to lag the true circular velocity of a test particle. The gas pressure is dominated by turbulence, and we evaluate its likely amplitude from recent estimates of the gas velocity dispersion and surface density. We show that where the amplitude of the rotation curve is comparable to the characteristic velocities of the interstellar turbulence, pressure support may lead to underestimates of the mass density of the underlying dark matter halo and the inner slope of its density profile. These effects may be significant for galaxies with rotation speeds {approx}<75 km s{sup -1} but are unlikely to be significant in higher-mass galaxies. We find that pressure support can be sustained over long timescales, because any reduction in support due to the conversion of gas into stars is compensated for by an inward flow of gas. However, we point to many uncertainties in assessing the importance of pressure support in real or simulated galaxies. Thus, while pressure support may help to alleviate possible tensions between rotation curve observations and {Lambda}CDM on kiloparsec scales, it should not be viewed as a definitive solution at this time.
African ancestry alleles may contribute to CKD among Hispanics/Latinos, but whether associations differ by Hispanic/Latino background remains unknown. We examined the association of CKD measures with ...African ancestry-specific
alleles that were directly genotyped and sickle cell trait (hemoglobin subunit
gene
variant) on the basis of imputation in 12,226 adult Hispanics/Latinos grouped according to Caribbean or Mainland background. We also performed an unbiased genome-wide association scan of urine albumin-to-creatinine ratios. Overall, 41.4% of participants were male, 44.6% of participants had a Caribbean background, and the mean age of all participants was 46.1 years. The Caribbean background group, compared with the Mainland background group, had a higher frequency of two
alleles (1.0% versus 0.1%) and the
variant (2.0% versus 0.7%). In the Caribbean background group, presence of
alleles (2 versus 0/1 copies) or the
variant (1 versus 0 copies) were significantly associated with albuminuria (odds ratio OR, 3.2; 95% confidence interval 95% CI, 1.7 to 6.1; and OR, 2.6; 95% CI, 1.8 to 3.8, respectively) and albuminuria and/or eGFR<60 ml/min per 1.73 m
(OR, 2.9; 95% CI, 1.5 to 5.4; and OR, 2.4; 95% CI, 1.7 to 3.5, respectively). The urine albumin-to-creatinine ratio genome-wide association scan identified associations with the
variant among all participants, with the strongest association in the Caribbean background group (
=3.1×10
versus
=9.3×10
for the Mainland background group). In conclusion, African-specific alleles associate with CKD in Hispanics/Latinos, but allele frequency varies by Hispanic/Latino background/ancestry.
US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies ...(GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a “genetic-analysis group” variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.
Abstract
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to ...heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.
We analyzed genome-wide association studies (GWASs), including data from 71,638 individuals from four ancestries, for estimated glomerular filtration rate (eGFR), a measure of kidney function used to ...define chronic kidney disease (CKD). We identified 20 loci attaining genome-wide-significant evidence of association (p < 5 × 10−8) with kidney function and highlighted that allelic effects on eGFR at lead SNPs are homogeneous across ancestries. We leveraged differences in the pattern of linkage disequilibrium between diverse populations to fine-map the 20 loci through construction of “credible sets” of variants driving eGFR association signals. Credible variants at the 20 eGFR loci were enriched for DNase I hypersensitivity sites (DHSs) in human kidney cells. DHS credible variants were expression quantitative trait loci for NFATC1 and RGS14 (at the SLC34A1 locus) in multiple tissues. Loss-of-function mutations in ancestral orthologs of both genes in Drosophila melanogaster were associated with altered sensitivity to salt stress. Renal mRNA expression of Nfatc1 and Rgs14 in a salt-sensitive mouse model was also reduced after exposure to a high-salt diet or induced CKD. Our study (1) demonstrates the utility of trans-ethnic fine mapping through integration of GWASs involving diverse populations with genomic annotation from relevant tissues to define molecular mechanisms by which association signals exert their effect and (2) suggests that salt sensitivity might be an important marker for biological processes that affect kidney function and CKD in humans.
Chronic kidney disease (CKD) affects ~10% of the global population, with considerable ethnic differences in prevalence and aetiology. We assemble genome-wide association studies of estimated ...glomerular filtration rate (eGFR), a measure of kidney function that defines CKD, in 312,468 individuals of diverse ancestry. We identify 127 distinct association signals with homogeneous effects on eGFR across ancestries and enrichment in genomic annotations including kidney-specific histone modifications. Fine-mapping reveals 40 high-confidence variants driving eGFR associations and highlights putative causal genes with cell-type specific expression in glomerulus, and in proximal and distal nephron. Mendelian randomisation supports causal effects of eGFR on overall and cause-specific CKD, kidney stone formation, diastolic blood pressure and hypertension. These results define novel molecular mechanisms and putative causal genes for eGFR, offering insight into clinical outcomes and routes to CKD treatment development.
Individuals with cystic fibrosis (CF) develop complications of the gastrointestinal tract influenced by genetic variants outside of CFTR. Cystic fibrosis-related diabetes (CFRD) is a distinct form of ...diabetes with a variable age of onset that occurs frequently in individuals with CF, while meconium ileus (MI) is a severe neonatal intestinal obstruction affecting ∼20% of newborns with CF. CFRD and MI are slightly correlated traits with previous evidence of overlap in their genetic architectures. To better understand the genetic commonality between CFRD and MI, we used whole-genome-sequencing data from the CF Genome Project to perform genome-wide association. These analyses revealed variants at 11 loci (6 not previously identified) that associated with MI and at 12 loci (5 not previously identified) that associated with CFRD. Of these, variants at SLC26A9, CEBPB, and PRSS1 associated with both traits; variants at SLC26A9 and CEBPB increased risk for both traits, while variants at PRSS1, the higher-risk alleles for CFRD, conferred lower risk for MI. Furthermore, common and rare variants within the SLC26A9 locus associated with MI only or CFRD only. As expected, different loci modify risk of CFRD and MI; however, a subset exhibit pleiotropic effects indicating etiologic and mechanistic overlap between these two otherwise distinct complications of CF.
Genetic modifiers play a significant role in two independent complications of cystic fibrosis (CF): neonatal intestinal obstruction and diabetes. Whole-genome sequencing followed by common and rare variant association identified pleiotropic loci displaying concordant and/or discordant modification of each trait, revealing unexpected mechanistic overlap between distinct complications of CF.
Sleep disordered breathing (SDB)-related overnight hypoxemia is associated with cardiometabolic disease and other comorbidities. Understanding the genetic bases for variations in nocturnal hypoxemia ...may help understand mechanisms influencing oxygenation and SDB-related mortality. We conducted genome-wide association tests across 10 cohorts and 4 populations to identify genetic variants associated with three correlated measures of overnight oxyhemoglobin saturation: average and minimum oxyhemoglobin saturation during sleep and the percent of sleep with oxyhemoglobin saturation under 90%. The discovery sample consisted of 8,326 individuals. Variants with p < 1 × 10(-6) were analyzed in a replication group of 14,410 individuals. We identified 3 significantly associated regions, including 2 regions in multi-ethnic analyses (2q12, 10q22). SNPs in the 2q12 region associated with minimum SpO2 (rs78136548 p = 2.70 × 10(-10)). SNPs at 10q22 were associated with all three traits including average SpO2 (rs72805692 p = 4.58 × 10(-8)). SNPs in both regions were associated in over 20,000 individuals and are supported by prior associations or functional evidence. Four additional significant regions were detected in secondary sex-stratified and combined discovery and replication analyses, including a region overlapping Reelin, a known marker of respiratory complex neurons.These are the first genome-wide significant findings reported for oxyhemoglobin saturation during sleep, a phenotype of high clinical interest. Our replicated associations with HK1 and IL18R1 suggest that variants in inflammatory pathways, such as the biologically-plausible NLRP3 inflammasome, may contribute to nocturnal hypoxemia.