Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to ...quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.
Display omitted
•Meta-analysis of rare copy-number variants (rCNVs) in nearly one million humans•Discovered hundreds of rCNV-disease associations across 54 disorders•Convergence of rCNVs & damaging coding variants at dosage sensitive loci•Ensemble machine learning identified 3,635 highly dosage sensitive genes
Harmonizing genomic data from nearly one million individuals yields insights into the properties of rare copy-number variants across disorders and dosage sensitivity predictions for all autosomal protein-coding genes.
Human gut microbiota is an important determinant for health and disease, and recent studies emphasize the numerous factors shaping its diversity. Here we performed a genome-wide association study ...(GWAS) of the gut microbiota using two cohorts from northern Germany totaling 1,812 individuals. Comprehensively controlling for diet and non-genetic parameters, we identify genome-wide significant associations for overall microbial variation and individual taxa at multiple genetic loci, including the VDR gene (encoding vitamin D receptor). We observe significant shifts in the microbiota of Vdr
mice relative to control mice and correlations between the microbiota and serum measurements of selected bile and fatty acids in humans, including known ligands and downstream metabolites of VDR. Genome-wide significant (P < 5 × 10
) associations at multiple additional loci identify other important points of host-microbe intersection, notably several disease susceptibility genes and sterol metabolism pathway components. Non-genetic and genetic factors each account for approximately 10% of the variation in gut microbiota, whereby individual effects are relatively small.
Psoriatic arthritis (PsA) is a complex chronic musculoskeletal condition that occurs in ~30% of psoriasis patients. Currently, no systematic strategy is available that utilizes the differences in ...genetic architecture between PsA and cutaneous-only psoriasis (PsC) to assess PsA risk before symptoms appear. Here, we introduce a computational pipeline for predicting PsA among psoriasis patients using data from six cohorts with >7000 genotyped PsA and PsC patients. We identify 9 new loci for psoriasis or its subtypes and achieve 0.82 area under the receiver operator curve in distinguishing PsA vs. PsC when using 200 genetic markers. Among the top 5% of our PsA prediction we achieve >90% precision with 100% specificity and 16% recall for predicting PsA among psoriatic patients, using conditional inference forest or shrinkage discriminant analysis. Combining statistical and machine-learning techniques, we show that the underlying genetic differences between psoriasis subtypes can be used for individualized subtype risk assessment.
Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we ...report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.
Male pattern baldness (MPB) or androgenetic alopecia is one of the most common conditions affecting men, reaching a prevalence of ~50% by the age of 50; however, the known genes explain little of the ...heritability. Here, we present the results of a genome-wide association study including more than 70,000 men, identifying 71 independently replicated loci, of which 30 are novel. These loci explain 38% of the risk, suggesting that MPB is less genetically complex than other complex traits. We show that many of these loci contain genes that are relevant to the pathology and highlight pathways and functions underlying baldness. Finally, despite only showing genome-wide genetic correlation with height, pathway-specific genetic correlations are significant for traits including lifespan and cancer. Our study not only greatly increases the number of MPB loci, illuminating the genetic architecture, but also provides a new approach to disentangling the shared biological pathways underlying complex diseases.
Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the ...correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (r
). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples (Formula: see text for cis-eQTLs and Formula: see text for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL/mQTL data with large sample sizes.
Metabolites are small molecules involved in cellular metabolism, which can be detected in biological samples using metabolomic techniques. Here we present the results of genome-wide association and ...meta-analyses for variation in the blood serum levels of 129 metabolites as measured by the Biocrates metabolomic platform. In a discovery sample of 7,478 individuals of European descent, we find 4,068 genome- and metabolome-wide significant (Z-test, P < 1.09 × 10(-9)) associations between single-nucleotide polymorphisms (SNPs) and metabolites, involving 59 independent SNPs and 85 metabolites. Five of the fifty-nine independent SNPs are new for serum metabolite levels, and were followed-up for replication in an independent sample (N = 1,182). The novel SNPs are located in or near genes encoding metabolite transporter proteins or enzymes (SLC22A16, ARG1, AGPS and ACSL1) that have demonstrated biomedical or pharmaceutical importance. The further characterization of genetic influences on metabolic phenotypes is important for progress in biological and medical research.
Genotype-stratified variance of a quantitative trait could differ in the presence of gene-gene or gene-environment interactions. Genetic markers associated with phenotypic variance are thus ...considered promising candidates for follow-up interaction or joint location-scale analyses. However, as in studies of main effects, the X-chromosome is routinely excluded from "whole-genome" scans due to analytical challenges. Specifically, as males carry only one copy of the X-chromosome, the inherent sex-genotype dependency could bias the trait-genotype association, through sexual dimorphism in quantitative traits with sex-specific means or variances. Here we investigate phenotypic variance heterogeneity associated with X-chromosome single nucleotide polymorphisms (SNPs) and propose valid and powerful strategies. Among those, a generalized Levene's test has adequate power and remains robust to sexual dimorphism. An alternative approach is a sex-stratified analysis but at the cost of slightly reduced power and modeling flexibility. We applied both methods to an Estonian study of gene expression quantitative trait loci (eQTL; n = 841), and two complex trait studies of height, hip, and waist circumferences, and body mass index from Multi-Ethnic Study of Atherosclerosis (MESA; n = 2,073) and UK Biobank (UKB; n = 327,393). Consistent with previous eQTL findings on mean, we found some but no conclusive evidence for cis regulators being enriched for variance association. SNP rs2681646 is associated with variance of waist circumference (p = 9.5E-07) at X-chromosome-wide significance in UKB, with a suggestive female-specific effect in MESA (p = 0.048). Collectively, an enrichment analysis using permutated UKB (p < 0.1) and MESA (p < 0.01) datasets, suggests a possible polygenic structure for the variance of human height.
The proteome holds great potential as an intermediate layer between the genome and phenome. Previous protein quantitative trait locus studies have focused mainly on describing the effects of common ...genetic variations on the proteome. Here, we assessed the impact of the common and rare genetic variations as well as the copy number variants (CNVs) on 326 plasma proteins measured in up to 500 individuals. We identified 184 cis and 94 trans signals for 157 protein traits, which were further fine-mapped to credible sets for 101 cis and 87 trans signals for 151 proteins. Rare genetic variation contributed to the levels of 7 proteins, with 5 cis and 14 trans associations. CNVs were associated with the levels of 11 proteins (7 cis and 5 trans), examples including a 3q12.1 deletion acting as a hub for multiple trans associations; and a CNV overlapping NAIP, a sensor component of the NAIP-NLRC4 inflammasome which is affecting pro-inflammatory cytokine interleukin 18 levels. In summary, this work presents a comprehensive resource of genetic variation affecting the plasma protein levels and provides the interpretation of identified effects.
As COVID-19 vaccines' accessibility has grown, so has the role of personal choice in vaccination, and not everybody is willing to vaccinate. Exploring personality traits' associations with ...vaccination could highlight some person-level drivers of, and barriers to, vaccination. We used self- and informant-ratings of the Five-Factor Model domains and their subtraits (a) measured approximately at the time of vaccination with the 100 Nuances of Personality (100NP) item pool (N = 56,575) and (b) measured on average ten years before the pandemic with the NEO Personality Inventory-3 (NEO-PI-3; N = 3,168). We tested individual domains' and either items' (in the 100NP sample) or facets' (in the NEO-PI-3 sample) associations with vaccination, as well as their collective ability to predict vaccination using elastic net models trained and tested in independent sample partitions. Although the NEO-PI-3 domains and facets did not predict vaccination ten years later, the domains correlated with vaccination in the 100NP sample, with vaccinated people scoring slightly higher on neuroticism and agreeableness and lower on openness, controlling for age, sex, and education. Collectively, the five domains predicted vaccination with an accuracy of r = .08. Associations were stronger at the item level. Vaccinated people were, on average, more science-minded, politically liberal, respectful of rules and authority, and anxious but less spiritual, religious, and self-assured. The 100NP items collectively predicted vaccination with r = .31 accuracy. We conclude that unvaccinated people may be a psychologically heterogeneous group and highlight some potential areas for action in vaccination campaigns.