This chapter provides an overview of statistical methods for genome-wide association studies (GWAS) in animals, plants, and humans. The simplest form of GWAS, a marker-by-marker analysis, is ...illustrated with a simple example. The problem of selecting a significance threshold that accounts for the large amount of multiple testing that occurs in GWAS is discussed. Population structure causes false positive associations in GWAS if not accounted for, and methods to deal with this are presented. Methodology for more complex models for GWAS, including haplotype-based approaches, accounting for identical by descent versus identical by state, and fitting all markers simultaneously are described and illustrated with examples.
Full text
Available for:
FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models ...and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The world demand for animal-based food products is anticipated to increase by 70% by 2050. Meeting this demand in a way that has a minimal impact on the environment will require the implementation of ...advanced technologies, and methods to improve the genetic quality of livestock are expected to play a large part. Over the past 10 years, genomic selection has been introduced in several major livestock species and has more than doubled genetic progress in some. However, additional improvements are required. Genomic information of increasing complexity (including genomic, epigenomic, transcriptomic and microbiome data), combined with technological advances for its cost-effective collection and use, will make a major contribution.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Polymorphisms that affect complex traits or quantitative trait loci (QTL) often affect multiple traits. We describe two novel methods (1) for finding single nucleotide polymorphisms (SNPs) ...significantly associated with one or more traits using a multi-trait, meta-analysis, and (2) for distinguishing between a single pleiotropic QTL and multiple linked QTL. The meta-analysis uses the effect of each SNP on each of n traits, estimated in single trait genome wide association studies (GWAS). These effects are expressed as a vector of signed t-values (t) and the error covariance matrix of these t values is approximated by the correlation matrix of t-values among the traits calculated across the SNP (V). Consequently, t'V-1t is approximately distributed as a chi-squared with n degrees of freedom. An attractive feature of the meta-analysis is that it uses estimated effects of SNPs from single trait GWAS, so it can be applied to published data where individual records are not available. We demonstrate that the multi-trait method can be used to increase the power (numbers of SNPs validated in an independent population) of GWAS in a beef cattle data set including 10,191 animals genotyped for 729,068 SNPs with 32 traits recorded, including growth and reproduction traits. We can distinguish between a single pleiotropic QTL and multiple linked QTL because multiple SNPs tagging the same QTL show the same pattern of effects across traits. We confirm this finding by demonstrating that when one SNP is included in the statistical model the other SNPs have a non-significant effect. In the beef cattle data set, cluster analysis yielded four groups of QTL with similar patterns of effects across traits within a group. A linear index was used to validate SNPs having effects on multiple traits and to identify additional SNPs belonging to these four groups.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The 1000 Bull Genomes Project is a collection of whole-genome sequences from 2,703 individuals capturing a significant proportion of the world's cattle diversity. So far, 84 million single-nucleotide ...polymorphisms (SNPs) and 2.5 million small insertion deletions have been identified in the collection, a very high level of genetic diversity. The project has greatly accelerated the identification of deleterious mutations for a range of genetic diseases, as well as for embryonic lethals. The rate of identification of causal mutations for complex traits has been slower, reflecting the typically small effect size of these mutations and the fact that many are likely in as-yet-unannotated regulatory regions. Both the deleterious mutations that have been identified and the mutations associated with complex trait variation have been included in low-cost SNP array designs, and these arrays are being genotyped in tens of thousands of dairy and beef cattle, enabling management of deleterious mutations in these populations as well as genomic selection.
We prove that if \Gamma is a sofic group and A is a finitely generated \mathbb{Z}(\Gamma )-module, then the metric mean dimension of \Gamma \curvearrowright \widehat {A}, in the sense of Hanfeng Li, ...is equal to the von Neumann-Lück rank of A. This partially extends the results of Hanfeng Li and Bingbing Liang from the case of amenable groups to the case of sofic groups. Additionally we show that the mean dimension of \Gamma \curvearrowright \widehat {A} is the von Neumann-Lück rank of A if A is finitely presented and \Gamma is residually finite. It turns out that our approach naturally leads to a notion of p-metric mean dimension, which is in between mean dimension and the usual metric mean dimension. This can be seen as an obstruction to the equality of mean dimension and metric mean dimension. While we cannot decide if mean dimension is the same as metric mean dimension for algebraic actions, we show in the metric case that for all p the p-metric mean dimension coincides with the von Neumann-Lück rank of the dual module.
Full text
Available for:
BFBNIB, INZLJ, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK, ZRSKP
As the global population and global wealth both continue to increase, so will the demand for livestock products, especially those that are highly nutritious. However, competition with other uses for ...land and water resources will also intensify, necessitating more efficient livestock production. In addition, as climate change escalates, reduced methane emissions from cattle and sheep will be a critical goal. Application of new technologies, including genomic selection and advanced reproductive technologies, will play an important role in meeting these challenges. Genomic selection, which enables prediction of the genetic merit of animals from genome-wide SNP markers, has already been adopted by dairy industries worldwide and is expected to double genetic gains for milk production and other traits. Here, we review these gains. We also discuss how the use of whole-genome sequence data should both accelerate the rate of gain and enable rapid discovery and elimination of genetic defects from livestock populations.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Genome-wide panels of SNPs have recently been used in domestic animal species to map and identify genes for many traits and to select genetically desirable livestock. This has led to the discovery of ...the causal genes and mutations for several single-gene traits but not for complex traits. However, the genetic merit of animals can still be estimated by genomic selection, which uses genome-wide SNP panels as markers and statistical methods that capture the effects of large numbers of SNPs simultaneously. This approach is expected to double the rate of genetic improvement per year in many livestock systems.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Zero hunger and good health could be realized by 2030 through effective conservation, characterization and utilization of germplasm resources
. So far, few chickpea (Cicer arietinum) germplasm ...accessions have been characterized at the genome sequence level
. Here we present a detailed map of variation in 3,171 cultivated and 195 wild accessions to provide publicly available resources for chickpea genomics research and breeding. We constructed a chickpea pan-genome to describe genomic diversity across cultivated chickpea and its wild progenitor accessions. A divergence tree using genes present in around 80% of individuals in one species allowed us to estimate the divergence of Cicer over the last 21 million years. Our analysis found chromosomal segments and genes that show signatures of selection during domestication, migration and improvement. The chromosomal locations of deleterious mutations responsible for limited genetic diversity and decreased fitness were identified in elite germplasm. We identified superior haplotypes for improvement-related traits in landraces that can be introgressed into elite breeding lines through haplotype-based breeding, and found targets for purging deleterious alleles through genomics-assisted breeding and/or gene editing. Finally, we propose three crop breeding strategies based on genomic prediction to enhance crop productivity for 16 traits while avoiding the erosion of genetic diversity through optimal contribution selection (OCS)-based pre-breeding. The predicted performance for 100-seed weight, an important yield-related trait, increased by up to 23% and 12% with OCS- and haplotype-based genomic approaches, respectively.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
The success of genome-wide association studies (GWASs) has led to increasing interest in making predictions of complex trait phenotypes, including disease, from genotype data. Rigorous assessment of ...the value of predictors is crucial before implementation. Here we discuss some of the limitations and pitfalls of prediction analysis and show how naive implementations can lead to severe bias and misinterpretation of results.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK