Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of ...the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping ...methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive.
We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects.
FINEMAP v1.0 is freely available for Mac OS X and Linux at http://www.christianbenner.com
: christian.benner@helsinki.fi or matti.pirinen@helsinki.fi.
Susceptibility to common human diseases is influenced by both genetic and environmental factors. The explosive growth of genetic data, and the knowledge that it is generating, are transforming our ...biological understanding of these diseases. In this review, we describe the technological and analytical advances that have enabled genome-wide association studies to be successful in identifying a large number of genetic variants robustly associated with common disease. We examine the biological insights that these genetic associations are beginning to produce, from functional mechanisms involving individual genes to biological pathways linking associated genes, and the identification of functional annotations, some of which are cell-type-specific, enriched in disease associations. Although most efforts have focused on identifying and interpreting genetic variants that are irrefutably associated with disease, it is increasingly clear that—even at large sample sizes—these represent only the tip of the iceberg of genetic signal, motivating polygenic analyses that consider the effects of genetic variants throughout the genome, including modest effects that are not individually statistically significant. As data from an increasingly large number of diseases and traits are analysed, pleiotropic effects (defined as genetic loci affecting multiple phenotypes) can help integrate our biological understanding. Looking forward, the next generation of population-scale data resources, linking genomic information with health outcomes, will lead to another step-change in our ability to understand, and treat, common diseases.
In humans, the rate of recombination, as measured on the megabase scale, is positively associated with the level of genetic variation, as measured at the genic scale. Despite considerable debate, it ...is not clear whether these factors are causally linked or, if they are, whether this is driven by the repeated action of adaptive evolution or molecular processes such as double-strand break formation and mismatch repair. We introduce three innovations to the analysis of recombination and diversity: fine-scale genetic maps estimated from genotype experiments that identify recombination hotspots at the kilobase scale, analysis of an entire human chromosome, and the use of wavelet techniques to identify correlations acting at different scales. We show that recombination influences genetic diversity only at the level of recombination hotspots. Hotspots are also associated with local increases in GC content and the relative frequency of GC-increasing mutations but have no effect on substitution rates. Broad-scale association between recombination and diversity is explained through covariance of both factors with base composition. To our knowledge, these results are the first evidence of a direct and local influence of recombination hotspots on genetic variation and the fate of individual mutations. However, that hotspots have no influence on substitution rates suggests that they are too ephemeral on an evolutionary time scale to have a strong influence on broader scale patterns of base composition and long-term molecular evolution.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genome-wide association studies (GWAS) search for associations between genetic variants and disease status, typically via logistic regression. Often there are covariates, such as sex or ...well-established major genetic factors, that are known to affect disease susceptibility and are independent of tested genotypes at the population level. We show theoretically and with data from recent GWAS on multiple sclerosis, psoriasis and ankylosing spondylitis that inclusion of known covariates can substantially reduce power for the identification of associated variants when the disease prevalence is lower than a few percent. Whether the inclusion of such covariates reduces or increases power to detect genetic effects depends on various factors, including the prevalence of the disease studied. When the disease is common (prevalence of >20%), the inclusion of covariates typically increases power, whereas, for rarer diseases, it can often decrease power to detect new genetic associations.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
UK Biobank is among the world's largest repositories for phenotypic and genotypic information in individuals of European ancestry. We performed a genome-wide association study in UK Biobank testing ...∼9 million DNA sequence variants for association with coronary artery disease (4,831 cases and 115,455 controls) and carried out meta-analysis with previously published results. We identified 15 new loci, bringing the total number of loci associated with coronary artery disease to 95 at the time of analysis. Phenome-wide association scanning showed that CCDC92 likely affects coronary artery disease through insulin resistance pathways, whereas experimental analysis suggests that ARHGEF26 influences the transendothelial migration of leukocytes.
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for ...example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.
Availability: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer
Contact:
chris.spencer@well.ox.ac.uk
Supplementary information:
Supplementary data are available at Bioinformatics online.
Outcomes of hepatitis C virus (HCV) infection and treatment depend on viral and host genetic factors. Here we use human genome-wide genotyping arrays and new whole-genome HCV viral sequencing ...technologies to perform a systematic genome-to-genome study of 542 individuals who were chronically infected with HCV, predominantly genotype 3. We show that both alleles of genes encoding human leukocyte antigen molecules and genes encoding components of the interferon lambda innate immune system drive viral polymorphism. Additionally, we show that IFNL4 genotypes determine HCV viral load through a mechanism dependent on a specific amino acid residue in the HCV NS5A protein. These findings highlight the interplay between the innate immune system and the viral genome in HCV control.
New directly acting antivirals (DAAs) provide very high cure rates in most patients infected by hepatitis C virus (HCV). However, some patient groups have been relatively harder to treat, including ...those with cirrhosis or infected with HCV genotype 3. In the recent BOSON trial, genotype 3, patients with cirrhosis receiving a 16‐week course of sofosbuvir and ribavirin had a sustained virological response (SVR) rate of around 50%. In patients with cirrhosis, interferon lambda 4 (IFNL4) CC genotype was significantly associated with SVR. This genotype was also associated with a lower interferon‐stimulated gene (ISG) signature in peripheral blood and in liver at baseline. Unexpectedly, patients with the CC genotype showed a dynamic increase in ISG expression between weeks 4 and 16 of DAA therapy, whereas the reverse was true for non‐CC patients. Conclusion: These data provide an important dynamic link between host genotype and phenotype in HCV therapy also potentially relevant to naturally acquired infection. (Hepatology 2018; 00:000‐000).
Visceral leishmaniasis (VL) is characterised by a high degree of spatial clustering at all scales, and this feature remains even with successful control measures. VL is targeted for elimination as a ...public health problem in the Indian subcontinent by 2020, and incidence has been falling rapidly since 2011. Current control is based on early diagnosis and treatment of clinical cases, and blanket indoor residual spraying of insecticide (IRS) in endemic villages to kill the sandfly vectors. Spatially targeting active case detection and/or IRS to higher risk areas would greatly reduce costs of control, but its effectiveness as a control strategy is unknown. The effectiveness depends on two key unknowns: how quickly transmission risk decreases with distance from a VL case and how much asymptomatically infected individuals contribute to transmission.
To estimate these key parameters, a spatiotemporal transmission model for VL was developed and fitted to geo-located epidemiological data on 2494 individuals from a highly endemic village in Mymensingh, Bangladesh. A Bayesian inference framework that could account for the unknown infection times of the VL cases, and missing symptom onset and recovery times, was developed to perform the parameter estimation. The parameter estimates obtained suggest that, in a highly endemic setting, VL risk decreases relatively quickly with distance from a case-halving within 90m-and that VL cases contribute significantly more to transmission than asymptomatic individuals.
These results suggest that spatially-targeted interventions may be effective for limiting transmission. However, the extent to which spatial transmission patterns and the asymptomatic contribution vary with VL endemicity and over time is uncertain. In any event, interventions would need to be performed promptly and in a large radius (≥300m) around a new case to reduce transmission risk.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK