Systems Genetics for Evolutionary Studies Prins, Pjotr; Smant, Geert; Arends, Danny ...
Methods in molecular biology (Clifton, N.J.),
2019, Letnik:
1910
Journal Article
Odprti dostop
Systems genetics combines high-throughput genomic data with genetic analysis. In this chapter, we review and discuss application of systems genetics in the context of evolutionary studies, in which ...high-throughput molecular technologies are being combined with quantitative trait locus (QTL) analysis in segregating populations.The recent explosion of high-throughput data-measuring thousands of RNAs, proteins, and metabolites, using deep sequencing, mass spectrometry, chromatin, methyl-DNA immunoprecipitation, etc.-allows the dissection of causes of genetic variation underlying quantitative phenotypes of all types. To deal with the sheer amount of data, powerful statistical tools are needed to analyze multidimensional relationships and to extract valuable information and new modes and mechanisms of changes both within and between species. In the context of evolutionary computational biology, a well-designed experiment and the right population can help dissect complex traits likely to be under selection using proven statistical methods for associating phenotypic variation with chromosomal locations.Recent evolutionary expression QTL (eQTL) studies focus on gene expression adaptations, mapping the gene expression landscape, and, tentatively, define networks of transcripts and proteins that are jointly modulated sets of eQTL networks. Here, we discuss the possibility of introducing an evolutionary "prior" in the form of gene families displaying evidence of positive selection, and using that prior in the context of an eQTL experiment for elucidating host-pathogen protein-protein interactions.Here we review one exemplar evolutionairy eQTL experiment and discuss experimental design, choice of platforms, analysis methods, scope, and interpretation of results. In brief we highlight how eQTL are defined; how they are used to assemble interacting and causally connected networks of RNAs, proteins, and metabolites; and how some QTLs can be efficiently converted to reasonably well-defined sequence variants.
Genome-wide linkage and association studies of tens of thousands of clinical and molecular traits are currently underway, offering rich data for inferring causality between traits and genetic ...variation. However, the inference process is based on discovering subtle patterns in the correlation between traits and is therefore challenging and could create a flood of untrustworthy causal inferences. Here we introduce the concerns and show that they are already valid in simple scenarios of two traits linked to or associated with the same genomic region. We argue that more comprehensive analysis and Bayesian reasoning are needed and that these can overcome some of the pitfalls, although not in every conceivable case. We conclude that causal inference methods can still be of use in the iterative process of mathematical modeling and biological validation.
Genome wide association studies have been hugely successful in identifying disease risk variants, yet most variants do not lead to coding changes and how variants influence biological function is ...usually unknown.
We correlated gene expression and genetic variation in untouched primary leucocytes (n = 110) from individuals with celiac disease - a common condition with multiple risk variants identified. We compared our observations with an EBV-transformed HapMap B cell line dataset (n = 90), and performed a meta-analysis to increase power to detect non-tissue specific effects.
In celiac peripheral blood, 2,315 SNP variants influenced gene expression at 765 different transcripts (< 250 kb from SNP, at FDR = 0.05, cis expression quantitative trait loci, eQTLs). 135 of the detected SNP-probe effects (reflecting 51 unique probes) were also detected in a HapMap B cell line published dataset, all with effects in the same allelic direction. Overall gene expression differences within the two datasets predominantly explain the limited overlap in observed cis-eQTLs. Celiac associated risk variants from two regions, containing genes IL18RAP and CCR3, showed significant cis genotype-expression correlations in the peripheral blood but not in the B cell line datasets. We identified 14 genes where a SNP affected the expression of different probes within the same gene, but in opposite allelic directions. By incorporating genetic variation in co-expression analyses, functional relationships between genes can be more significantly detected.
In conclusion, the complex nature of genotypic effects in human populations makes the use of a relevant tissue, large datasets, and analysis of different exons essential to enable the identification of the function for many genetic risk variants in common diseases.
eQTL Analysis in Humans Franke, Lude; Jansen, Ritsert C.
Cardiovascular Genomics,
01/2009, Letnik:
573
Book Chapter, Journal Article
Improving human health is a major aim of medical research, but it requires that variation between individuals be taken into account since each person carries a different combination of gene variants ...and is exposed to different environmental conditions, which can cause differences in susceptibility to diseases. With the advent of molecular markers in the 1980s, it became possible to genotype individuals (i.e., to detect the presence or absence of local DNA sequence variants at each of hundreds of genome positions). This DNA sequence variation could then be related to disease susceptibility by using pedigree data. Such linkage analyses proved to be difficult for more complex diseases. Recently, with the decreasing costs of genotyping, analyses of large natural populations of unrelated individuals became possible and resulted in the association of many genes (and genetic variants in these genes) with complex diseases. Unfortunately, for a considerable proportion of these genes and their proteins, it is not yet clear what their downstream effects are. Studying the expression of these genes and proteins can help to uncover the effects of these variants on the expression of these and other genes, proteins, metabolites, and phenotypes. In this chapter, we focus on the high-throughput and genome-wide measurement of gene expression in a natural population of unrelated humans, and on the subsequent association of variation in expression to “expression quantitative trait loci” (eQTLs) on DNA using oligonucleotide arrays with hundreds of thousands of single-nucleotide polymorphism (SNP) markers that capture most of the human genetic variation well. This strategy has been successfully applied to several diseases such as celiac disease (Hunt et al. 2008, Nat Genet 40, 395–402) and asthma (Moffatt et al. 2007, Nature 448, 470–473): associated genetic variants have been identified that affect levels of gene expression in cis or in trans, providing insight into the biological pathways affected by these diseases.
R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted ...to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.
We have designed and implemented an R add-on package, R/parallel, that extends R by adding user-friendly parallel computing capabilities. With R/parallel any bioinformatician can now easily automate the parallel execution of loops and benefit from the multicore processor power of today's desktop computers. Using a single and simple function, R/parallel can be integrated directly with other existing R packages. With no need to change the implemented algorithms, the processing time can be approximately reduced N-fold, N being the number of available processor cores.
R/parallel saves bioinformaticians time in their daily tasks of analyzing experimental data. It achieves this objective on two fronts: first, by reducing development time of parallel programs by avoiding reimplementation of existing methods and second, by reducing processing time by speeding up computations on current desktop computers. Future work is focused on extending the envelope of R/parallel by interconnecting and aggregating the power of several computers, both existing office computers and computing clusters.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Motivation: Sample mix-ups can arise during sample collection, handling, genotyping or data management. It is unclear how often sample mix-ups occur in genome-wide studies, as there currently are no ...post hoc methods that can identify these mix-ups in unrelated samples. We have therefore developed an algorithm (MixupMapper) that can both detect and correct sample mix-ups in genome-wide studies that study gene expression levels.
Results: We applied MixupMapper to five publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in one of the datasets 23% of the samples had incorrect expression phenotypes. The consequences of sample mix-ups are substantial: when we corrected these sample mix-ups, we identified on average 15% more significant cis-expression quantitative trait loci (cis-eQTLs). In one dataset, we identified three times as many significant cis-eQTLs after correction. Furthermore, we show through simulations that sample mix-ups can lead to an underestimation of the explained heritability of complex traits in genome-wide association datasets.
Availability and implementation:
MixupMapper is freely available at http://www.genenetwork.nl/mixupmapper/
Contact:
lude@ludesign.nl
Supplementary Information:
Supplementary data are available at Bioinformatics online.
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide ...association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27 × 10(-51)) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We combined large-scale mRNA expression analysis and gene mapping to identify genes and loci that control hematopoietic stem cell (HSC) function. We measured mRNA expression levels in purified HSCs ...isolated from a panel of densely genotyped recombinant inbred mouse strains. We mapped quantitative trait loci (QTLs) associated with variation in expression of thousands of transcripts. By comparing the physical transcript position with the location of the controlling QTL, we identified polymorphic cis-acting stem cell genes. We also identified multiple trans-acting control loci that modify expression of large numbers of genes. These groups of coregulated transcripts identify pathways that specify variation in stem cells. We illustrate this concept with the identification of candidate genes involved with HSC turnover. We compared expression QTLs in HSCs and brain from the same mice and identified both shared and tissue-specific QTLs. Our data are accessible through WebQTL, a web-based interface that allows custom genetic linkage analysis and identification of coregulated transcripts.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
A complex phenotype such as seed germination is the result of several genetic and environmental cues and requires the concerted action of many genes. The use of well-structured recombinant inbred ...lines in combination with "omics" analysis can help to disentangle the genetic basis of such quantitative traits. This so-called genetical genomics approach can effectively capture both genetic and epistatic interactions. However, to understand how the environment interacts with genomicencoded information, a better understanding of the perception and processing of environmental signals is needed. In a classical genetical genomics setup, this requires replication of the whole experiment in different environmental conditions. A novel generalized setup overcomes this limitation and includes environmental perturbation within a single experimental design. We developed a dedicated quantitative trait loci mapping procedure to implement this approach and used existing phenotypical data to demonstrate its power. In addition, we studied the genetic regulation of primary metabolism in dry and imbibed Arabidopsis (Arabidopsis thaliand) seeds. In the metabolome, many changes were observed that were under both environmental and genetic controls and their interaction. This concept offers unique reduction of experimental load with minimal compromise of statistical power and is of great potential in the field of systems genetics, which requires a broad understanding of both plasticity and dynamic regulation.
Want to make the most of your talent for science? This practical guide for students, postdoctorates and professors offers a unique stepwise approach to help you develop your expertise and become a ...more productive scientist. Covering topics from giving presentations and writing effectively to prioritising your workload, it provides guidance to enhance your skills and combine them with those of others to your mutual benefit. Learn how to maintain your passion for science, inspire others to develop their abilities and motivate yourself to plan effectively, focus on your goals and even optimise funding opportunities. With numerous valuable tips, real-life stories, novel questionnaires and exercises for self-reflection, this must-read guide provides everything you need to take responsibility for your own personal and professional development.