Genome-wide association studies (GWASs) have successfully uncovered thousands of robust associations between common variants and complex traits and diseases. Despite these successes, much of the ...heritability of these traits remains unexplained. Because low-frequency and rare variants are not tagged by conventional genome-wide genotyping arrays, they may represent an important and understudied component of complex trait genetics. In contrast to common variant GWASs, there are many different types of study designs, assays and analytic techniques that can be utilized for rare variant association studies (RVASs). In this review, we briefly present the different technologies available to identify rare genetic variants, including novel exome arrays. We also compare the different study designs for RVASs and argue that the best design will likely be phenotype-dependent. We discuss the main analytical issues relevant to RVASs, including the different statistical methods that can be used to test genetic associations with rare variants and the various bioinformatic approaches to predicting in silico biological functions for variants. Finally, we describe recent rare variant association findings, highlighting the unexpected conclusion that most rare variants have modest-to-small effect sizes on phenotypic variation. This observation has major implications for our understanding of the genetic architecture of complex traits in the context of the unexplained heritability challenge.
Polyploidy is generally not tolerated in animals, but is widespread in plant genomes and may result in extensive genetic redundancy. The fate of duplicated genes is poorly understood, both ...functionally and evolutionarily. Soybean (Glycine max L.) has undergone two separate polyploidy events (13 and 59 million years ago) that have resulted in 75% of its genes being present in multiple copies. It therefore constitutes a good model to study the impact of whole‐genome duplication on gene expression. Using RNA‐seq, we tested the functional fate of a set of approximately 18 000 duplicated genes. Across seven tissues tested, approximately 50% of paralogs were differentially expressed and thus had undergone expression sub‐functionalization. Based on gene ontology and expression data, our analysis also revealed that only a small proportion of the duplicated genes have been neo‐functionalized or non‐functionalized. In addition, duplicated genes were often found in collinear blocks, and several blocks of duplicated genes were co‐regulated, suggesting some type of epigenetic or positional regulation. We also found that transcription factors and ribosomal protein genes were differentially expressed in many tissues, suggesting that the main consequence of polyploidy in soybean may be at the regulatory level.
Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and ...test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM).
Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage.
The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF Formula: see text 0.001 and Rsq Formula: see text 0.3.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
C-reactive protein (CRP) is a systemic inflammation marker that predicts future cardiovascular risk. CRP levels are higher in African Americans and Hispanic Americans than in European Americans, but ...the genetic determinants of CRP in these admixed United States minority populations are largely unknown. We performed genome-wide association studies (GWASs) of 8,280 African American (AA) and 3,548 Hispanic American (HA) postmenopausal women from the Women's Health Initiative SNP Health Association Resource. We discovered and validated a CRP-associated variant of triggering receptors expressed by myeloid cells 2 (TREM2) in chromosomal region 6p21 (p = 10−10). The TREM2 variant associated with higher CRP is common in Africa but rare in other ancestral populations. In AA women, the CRP region in 1q23 contained a strong admixture association signal (p = 10−17), which appears to be related to several independent CRP-associated alleles; the strongest of these is present only in African ancestral populations and is associated with higher CRP. Of the other genomic loci previously associated with CRP through GWASs of European populations, most loci (LEPR, IL1RN, IL6R, GCKR, NLRP3, HNF1A, HNF4A, and APOC1) showed consistent patterns of association with CRP in AA and HA women. In summary, we have identified a common TREM2 variant associated with CRP in United States minority populations. The genetic architecture underlying the CRP phenotype in AA women is complex and involves genetic variants shared across populations, as well as variants specific to populations of African descent.
Massively parallel whole-genome sequencing (WGS) data have ushered in a new era in human genetics. These data are now being used to understand the role of rare variants in complex traits and to ...advance the goals of precision medicine. The technological and computing advances that have enabled us to generate WGS data on thousands of individuals have also outpaced our ability to perform analyses in scientifically and statistically rigorous and thoughtful ways. The past several years have witnessed the application of whole-exome sequencing (WES) to complex traits and diseases. From our analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but our collective experience offers some valuable lessons for WGS initiatives. These include caveats associated with generating automated pipelines for quality control and analysis of rare variants; the importance of studying minority populations; sample size requirements and efficient study designs for identifying rare-variant associations; and the significance of incidental findings in population-based genetic research. With the ESP as an example, we offer guidance and a framework on how to conduct a large-scale association study in the era of WGS.
Exome sequencing (ES) is rapidly being deployed for use in clinical settings despite limited empirical data about the number and types of incidental results (with potential clinical utility) that ...could be offered for return to an individual. We analyzed deidentified ES data from 6,517 participants (2,204 African Americans and 4,313 European Americans) from the National Heart, Lung, and Blood Institute Exome Sequencing Project. We characterized the frequencies of pathogenic alleles in genes underlying Mendelian conditions commonly assessed by newborn-screening (NBS, n = 39) programs, genes associated with age-related macular degeneration (ARMD, n = 17), and genes known to influence drug response (PGx, n = 14). From these 70 genes, we identified 10,789 variants and curated them by manual review of OMIM, HGMD, locus-specific databases, or primary literature to a total of 399 validated pathogenic variants. The mean number of risk alleles per individual was 15.3. Every individual had at least five known PGx alleles, 99% of individuals had at least one ARMD risk allele, and 45% of individuals were carriers for at least one pathogenic NBS allele. The carrier burden for severe recessive childhood disorders was 0.57. Our results demonstrate that risk alleles of potential clinical utility for both Mendelian and complex traits are detectable in every individual. These findings highlight the necessity of developing guidelines and policies that consider the return of results to all individuals and underscore the need to develop innovative approaches and tools that enable individuals to exercise their choice about the return of incidental results.
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed ...individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
The identification of rare coding or splice site variants remains the most straightforward strategy to link genes with human phenotypes. Here, we analyzed the association between 137,086 rare (minor ...allele frequency (MAF) <1%) coding or splice site variants and 15 hematological traits in up to 308,572 participants. We found 56 such rare coding or splice site variants at P<5x10-8, including 31 that are associated with a blood-cell phenotype for the first time. All but one of these 31 new independent variants map to loci previously implicated in hematopoiesis by genome-wide association studies (GWAS). This includes a rare splice acceptor variant (rs146597587, MAF = 0.5%) in interleukin 33 (IL33) associated with reduced eosinophil count (P = 2.4x10-23), and lower risk of asthma (P = 2.6x10-7, odds ratio 95% confidence interval = 0.56 0.45-0.70) and allergic rhinitis (P = 4.2x10-4, odds ratio = 0.55 0.39-0.76). The single new locus identified in our study is defined by a rare p.Arg172Gly missense variant (rs145535174, MAF = 0.05%) in plasminogen (PLG) associated with increased platelet count (P = 6.8x10-9), and decreased D-dimer concentration (P = 0.018) and platelet reactivity (P<0.03). Finally, our results indicate that searching for rare coding or splice site variants in very large sample sizes can help prioritize causal genes at many GWAS loci associated with complex human diseases and traits.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract Genetic association studies have a long history of delivering insightful results for cardiovascular disease (CVD) research. Beginning with early candidate gene studies, to genome-wide ...association studies, and now on to newer whole-genome sequencing studies, research in human genetics has enriched our understanding of the pathobiology of CVD. As these studies continue to expand, the issue of statistical power plays an important role in study design as well as the interpretation of results. We provide an overview of the component parts that determine statistical power and preview the future of CVD genetic association studies through this lens.