Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequences. Such annotations can have a critical role in identifying putatively ...causal variants for a disease or trait among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen) that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions) than the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.
Alopecia areata is a complex genetic disease that results in hair loss due to the autoimmune-mediated attack of the hair follicle. We previously defined a role for both rare and common variants in ...our earlier GWAS and linkage studies. Here, we identify rare variants contributing to Alopecia Areata using a whole exome sequencing and gene-level burden analyses approach on 849 Alopecia Areata patients compared to 15,640 controls. KRT82 is identified as an Alopecia Areata risk gene with rare damaging variants in 51 heterozygous Alopecia Areata individuals (6.01%), achieving genome-wide significance (p = 2.18E-07). KRT82 encodes a hair-specific type II keratin that is exclusively expressed in the hair shaft cuticle during anagen phase, and its expression is decreased in Alopecia Areata patient skin and hair follicles. Finally, we find that cases with an identified damaging KRT82 variant and reduced KRT82 expression have elevated perifollicular CD8 infiltrates. In this work, we utilize whole exome sequencing to successfully identify a significant Alopecia Areata disease-relevant gene, KRT82, and reveal a proposed mechanism for rare variant predisposition leading to disrupted hair shaft integrity.
The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method ...to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.
In the context of kidney transplantation, genomic incompatibilities between donor and recipient may lead to allosensitization against new antigens. We hypothesized that recessive inheritance of ...gene-disrupting variants may represent a risk factor for allograft rejection.
We performed a two-stage genetic association study of kidney allograft rejection. In the first stage, we performed a recessive association screen of 50 common gene-intersecting deletion polymorphisms in a cohort of kidney transplant recipients. In the second stage, we replicated our findings in three independent cohorts of donor-recipient pairs. We defined genomic collision as a specific donor-recipient genotype combination in which a recipient who was homozygous for a gene-intersecting deletion received a transplant from a nonhomozygous donor. Identification of alloantibodies was performed with the use of protein arrays, enzyme-linked immunosorbent assays, and Western blot analyses.
In the discovery cohort, which included 705 recipients, we found a significant association with allograft rejection at the
locus represented by rs893403 (hazard ratio with the risk genotype vs. nonrisk genotypes, 1.84; 95% confidence interval CI, 1.35 to 2.50; P = 9.8×10
). This effect was replicated under the genomic-collision model in three independent cohorts involving a total of 2004 donor-recipient pairs (hazard ratio, 1.55; 95% CI, 1.25 to 1.93; P = 6.5×10
). In the combined analysis (discovery cohort plus replication cohorts), the risk genotype was associated with a higher risk of rejection than the nonrisk genotype (hazard ratio, 1.63; 95% CI, 1.37 to 1.95; P = 4.7×10
). We identified a specific antibody response against LIMS1, a kidney-expressed protein encoded within the collision locus. The response involved predominantly IgG2 and IgG3 antibody subclasses.
We found that the
locus appeared to encode a minor histocompatibility antigen. Genomic collision at this locus was associated with rejection of the kidney allograft and with production of anti-LIMS1 IgG2 and IgG3. (Funded by the Columbia University Transplant Center and others.).
Recent developments in sequencing technologies have made it possible to uncover both rare and common genetic variants. Genome-wide association studies (GWASs) can test for the effect of common ...variants, whereas sequence-based association studies can evaluate the cumulative effect of both rare and common variants on disease risk. Many groupwise association tests, including burden tests and variance-component tests, have been proposed for this purpose. Although such tests do not exclude common variants from their evaluation, they focus mostly on testing the effect of rare variants by upweighting rare-variant effects and downweighting common-variant effects and can therefore lose substantial power when both rare and common genetic variants in a region influence trait susceptibility. There is increasing evidence that the allelic spectrum of risk variants at a given locus might include novel, rare, low-frequency, and common genetic variants. Here, we introduce several sequence kernel association tests to evaluate the cumulative effect of rare and common variants. The proposed tests are computationally efficient and are applicable to both binary and continuous traits. Furthermore, they can readily combine GWAS and whole-exome-sequencing data on the same individuals, when available, and are also applicable to deep-resequencing data of GWAS loci. We evaluate these tests on data simulated under comprehensive scenarios and show that compared with the most commonly used tests, including the burden and variance-component tests, they can achieve substantial increases in power. We next show applications to sequencing studies for Crohn disease and autism spectrum disorders. The proposed tests have been incorporated into the software package SKAT.
To evaluate evidence for de novo etiologies in schizophrenia, we sequenced at high coverage the exomes of families recruited from two populations with distinct demographic structures and history. We ...sequenced a total of 795 exomes from 231 parent-proband trios enriched for sporadic schizophrenia cases, as well as 34 unaffected trios. We observed in cases an excess of de novo nonsynonymous single-nucleotide variants as well as a higher prevalence of gene-disruptive de novo mutations relative to controls. We found four genes (LAMA2, DPYD, TRRAP and VPS39) affected by recurrent de novo events within or across the two populations, which is unlikely to have occurred by chance. We show that de novo mutations affect genes with diverse functions and developmental profiles, but we also find a substantial contribution of mutations in genes with higher expression in early fetal life. Our results help define the genomic and neural architecture of schizophrenia.
The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units ...for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. We propose here a semi-supervised approach, GenoNet, to jointly utilize experimentally ...confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell/tissue type specific epigenetic annotations to predict functional consequences of non-coding variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods at the tissue/cell type level, but especially so at the organism level. Importantly, we illustrate how the GenoNet scores can help in fine-mapping at GWAS loci, and in the discovery of disease associated genes in sequencing studies. As more comprehensive lists of experimentally validated variants become available over the next few years, semi-supervised methods like GenoNet can be used to provide increasingly accurate functional predictions for variants genome-wide and across a variety of cell/tissue types.
We analyze de novo synonymous mutations identified in autism spectrum disorders (ASDs) and schizophrenia (SCZ) with potential impact on regulatory elements using data from whole-exome sequencing ...(WESs) studies. Focusing on five types of genetic regulatory functions, we found that de novo near-splice site synonymous mutations changing exonic splicing regulators and those within frontal cortex-derived DNase I hypersensitivity sites are significantly enriched in ASD and SCZ, respectively. These results remained significant, albeit less so, after incorporating two additional ASD datasets. Among the genes identified, several are hit by multiple functional de novo mutations, with RAB2A and SETD1A showing the highest statistical significance in ASD and SCZ, respectively. The estimated contribution of these synonymous mutations to disease liability is comparable to de novo protein-truncating mutations. These findings expand the repertoire of functional de novo mutations to include “functional” synonymous ones and strengthen the role of rare variants in neuropsychiatric disease risk.
•De novo synonymous mutations likely affecting splicing regulation are enriched in ASD•De novo synonymous mutations within frontal cortex-derived DHS are enriched in SCZ•“Functional” synonymous mutations significantly contribute to disease liability•“Functional” synonymous mutations support role of SETD1A and RAB2A in neuropsychiatry
The role of de novo non-synonymous mutations in neuropsychiatric disorders is well established. In this study, Takata et al. explore the role of a different class of genetic variants, the synonymous mutations, and discover that they too contribute to autism and schizophrenia by affecting regulatory elements.
Genome-wide association studies (GWAS) for biomarkers important for clinical phenotypes can lead to clinically relevant discoveries. Conventional GWAS for quantitative traits are based on simplified ...regression models modeling the conditional mean of a phenotype as a linear function of genotype. We draw attention here to an alternative, lesser known approach, namely quantile regression that naturally extends linear regression to the analysis of the entire conditional distribution of a phenotype of interest. Quantile regression can be applied efficiently at biobank scale, while having some unique advantages such as (1) identifying variants with heterogeneous effects across quantiles of the phenotype distribution; (2) accommodating a wide range of phenotype distributions including non-normal distributions, with invariance of results to trait transformations; and (3) providing more detailed information about genotype-phenotype associations even for those associations identified by conventional GWAS. We show in simulations that quantile regression is powerful across both homogeneous and various heterogeneous models. Applications to 39 quantitative traits in the UK Biobank demonstrate that quantile regression can be a helpful complement to linear regression in GWAS and can identify variants with larger effects on high-risk subgroups of individuals but with lower or no contribution overall.Here, the authors propose using quantile regression for genome-wide association studies with quantitative traits in UK Biobank, showing its advantages over linear regression in handling nonnormal distributions and identifying heterogeneous genetic effects.