Assessing linkage disequilibrium (LD) across ancestral populations is a powerful approach for investigating population-specific genetic structure as well as functionally mapping regions of disease ...susceptibility. Here, we present LDlink, a web-based collection of bioinformatic modules that query single nucleotide polymorphisms (SNPs) in population groups of interest to generate haplotype tables and interactive plots. Modules are designed with an emphasis on ease of use, query flexibility, and interactive visualization of results. Phase 3 haplotype data from the 1000 Genomes Project are referenced for calculating pairwise metrics of LD, searching for proxies in high LD, and enumerating all observed haplotypes. LDlink is tailored for investigators interested in mapping common and uncommon disease susceptibility loci by focusing on output linking correlated alleles and highlighting putative functional variants.
LDlink is a free and publically available web tool which can be accessed at http://analysistools.nci.nih.gov/LDlink/.
mitchell.machiela@nih.gov.
Abstract
Motivation
Existing approaches to plot association results from genome-wide association studies (GWAS) are in the form of static Manhattan plots and often lack data integration with rich ...databases on variant regulatory potential as well as population-specific linkage disequilibrium patterns.
Summary
We created an intuitive web module for uploading and efficiently exploring GWAS association results. Interactive plots and sortable tables allow researchers to query genomic regions of interest, facilitating the integration of data on linkage disequilibrium, variant regulatory potential and potential target genes. External links allow for visualization of association results in the UCSC genome browser as well as easy access to publically available databases (e.g. dbSNP and RegulomeDB). Through improved visualization and data integration, LDassoc offers genomic researchers a specialized environment to examine association signals and suggests variants for functional investigation.
Availability and implementation
LDassoc is a free and publically available web tool which can be accessed online at https://analysistools.nci.nih.gov/LDlink/? tab=ldassoc.
Genomic research involving human genetics and evolutionary biology relies heavily on linkage disequilibrium (LD) to investigate population-specific genetic structure, functionally map regions of ...disease susceptibility and uncover evolutionary history. Interactive and powerful tools are needed to calculate population-specific LD estimates for integrative genomics research. LDlink is an interactive suite of web-based tools developed to query germline variants in 1000 Genomes Project population groups of interest and generate interactive tables and plots of LD estimates. As an expansion to this resource, we have developed an R package,
, designed to rapidly calculate statistics for large lists of variants and LD attributes that eliminates the time needed to perform repetitive requests from the web-based LDlink tool.
accelerates genomic research by providing efficient and user-friendly functions to programmatically interrogate and download pairwise LD estimates from expansive lists of genetic variants.
is a free and publicly available R package that can be installed from the Comprehensive R Archive Network (CRAN) or downloaded from https://github.com/CBIIT/LDlinkR.
The paradox of mutations and cancer Chanock, Stephen J
Science (American Association for the Advancement of Science),
11/2018, Letnik:
362, Številka:
6417
Journal Article
Recenzirano
Esophageal tissue often contains driver mutations in people without cancer
The past decade has witnessed the cataloging of genetic mutations in cancer genomes, providing new insights into how and in ...what ways cancer can develop and spread (
1
,
2
). The focus has been on defining specific “driver” mutations, genetic errors in cancer cells that reveal basic biological processes gone awry that are required for cancer initiation and progression. These drivers are the target of new therapies—this concept is central to precision oncology efforts to treat patients according to the genetic changes that are present in their tumors (
3
). Along the way, it has also become apparent that cancer genomes harbor many additional “passenger” mutations (
4
). Patterns of driver and passenger DNA mutations derived from cancer genomes have provided clues about the different ways that cancer can manifest as a disease of genetic mutations (
5
,
6
). In some circumstances, they can be linked to strong environmental carcinogens (for example, mutation patterns caused by tobacco smoke, ultraviolet radiation, or the fungal toxin aflatoxin) (
7
). Moreover, these forensic mutational patterns can be used to estimate how long it has taken for a tumor to develop (
5
). On page 911 of this issue, Martincorena
et al.
(
8
) turned their attention toward the detection of mutations in normal tissue, addressing a long-standing paradox that mutations arise in normal tissues but do not necessarily lead to cancer.
We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of ...effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Pooling genome-wide association studies (GWASs) increases power but also poses methodological challenges because studies are often heterogeneous. For example, combining GWASs of related but distinct ...traits can provide promising directions for the discovery of loci with small but common pleiotropic effects. Classical approaches for meta-analysis or pooled analysis, however, might not be suitable for such analysis because individual variants are likely to be associated with only a subset of the traits or might demonstrate effects in different directions. We propose a method that exhaustively explores subsets of studies for the presence of true association signals that are in either the same direction or possibly opposite directions. An efficient approximation is used for rapid evaluation of p values. We present two illustrative applications, one for a meta-analysis of separate case-control studies of six distinct cancers and another for pooled analysis of a case-control study of glioma, a class of brain tumors that contains heterogeneous subtypes. Both the applications and additional simulation studies demonstrate that the proposed methods offer improved power and more interpretable results when compared to traditional methods for the analysis of heterogeneous traits. The proposed framework has applications beyond genetic association studies.
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each ...individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.
We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies ...(GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn's disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15-20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK