Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions ...requires huge sample sizes
. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel
) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single ...nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods.
The margins of an expanding range are predicted to be challenging environments for adaptation. Marginal populations should often experience low effective population sizes (Ne) where genetic drift is ...high due to demographic expansion and/or census population size is low due to unfavourable environmental conditions. Nevertheless, invasive species demonstrate increasing evidence of rapid evolution and potential adaptation to novel environments encountered during colonization, calling into question whether significant reductions in Ne are realized during range expansions in nature. Here we report one of the first empirical tests of the joint effects of expansion dynamics and environment on effective population size variation during invasive range expansion. We estimate contemporary values of Ne using rates of linkage disequilibrium among genome‐wide markers within introduced populations of the highly invasive plant Centaurea solstitialis (yellow starthistle) in North America (California, USA), and within native Eurasian populations. As predicted, we find that Ne within the invaded range is positively correlated with both expansion history (time since founding) and habitat quality (abiotic climate). History and climate had independent additive effects with similar effect sizes, indicating an important role for both factors in this invasion. These results support theoretical expectations for the population genetics of range expansion, though whether these processes can ultimately arrest the spread of an invasive species remains an unanswered question.
Transcriptome-wide association studies using predicted expression have identified thousands of genes whose locally regulated expression is associated with complex traits and diseases. In this work, ...we show that linkage disequilibrium induces significant gene-trait associations at non-causal genes as a function of the expression quantitative trait loci weights used in expression prediction. We introduce a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal. Importantly, our approach remains accurate when expression data for causal genes are not available in the causal tissue by leveraging expression prediction from other tissues. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (for example, 90%) that can be used to prioritize genes for functional assays. We illustrate our approach by using an integrative analysis of lipid traits, where our approach prioritizes genes with strong evidence for causality.
Haplotype-based breeding (HBB) is one of the cutting-edge technologies in the realm of crop improvement due to the increasing availability of Single Nucleotide Polymorphisms identified by Next ...Generation Sequencing technologies. The complexity of the data can be decreased with fewer statistical tests and a lower probability of spurious associations by combining thousands of SNPs into a few hundred haplotype blocks. The presence of strong genomic regions in breeding lines of most crop species facilitates the use of haplotypes to improve the efficiency of genomic and marker-assisted selection. Haplotype-based breeding as a Genomic Assisted Breeding (GAB) approach harnesses the genome sequence data to pinpoint the allelic variation used to hasten the breeding cycle and circumvent the challenges associated with linkage drag. This review article demonstrates ways to identify candidate genes, superior haplotype identification, haplo-pheno analysis, and haplotype-based marker-assisted selection. The crop improvement strategies that utilize superior haplotypes will hasten the breeding progress to safeguard global food security.
•Haplotype-based breeding unlock the genetic variations in genetic resources for the crop improvement.•Haplotype-based breeding in rice based on the genome sequence data available and they are resequenced further for identification of haplotypes.•It is expected to integrate with biotechnological tools such as precise genome editing, functional genomics to facilitate the acceleration of genetic gain and to address global food security.
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard ...GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta’s D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.
This study investigates epistasis in the genetic architecture of common diseases by using long-range linkage disequilibrium patterns. One significant and four near-significant associations across five disease phenotypes were identified, highlighting the pleiotropic and conserved nature of variants under epistatic selection. These findings provide insights into the genetic mechanisms underlying complex diseases.
Summary
The adaptation of weeds to herbicide is both a significant problem in agriculture and a model of rapid adaptation. However, significant gaps remain in our knowledge of resistance controlled ...by many loci and the evolutionary factors that influence the maintenance of resistance.
Here, using herbicide‐resistant populations of the common morning glory (Ipomoea purpurea), we perform a multilevel analysis of the genome and transcriptome to uncover putative loci involved in nontarget‐site herbicide resistance (NTSR) and to examine evolutionary forces underlying the maintenance of resistance in natural populations.
We found loci involved in herbicide detoxification and stress sensing to be under selection and confirmed that detoxification is responsible for glyphosate (RoundUp) resistance using a functional assay. We identified interchromosomal linkage disequilibrium (ILD) among loci under selection reflecting either historical processes or additive effects leading to the resistance phenotype. We further identified potential fitness cost loci that were strongly linked to resistance alleles, indicating the role of genetic hitchhiking in maintaining the cost.
Overall, our work suggests that NTSR glyphosate resistance in I. purpurea is conferred by multiple genes which are potentially maintained through generations via ILD, and that the fitness cost associated with resistance in this species is likely a by‐product of genetic hitchhiking.