Abstract
Summary
AlphaMate is a flexible program that optimizes selection, maintenance of genetic diversity and mate allocation in breeding programs. It can be used in animal and cross- and ...self-pollinating plant populations. These populations can be subject to selective breeding or conservation management. The problem is formulated as a multi-objective optimization of a valid mating plan that is solved with an evolutionary algorithm. A valid mating plan is defined by a combination of mating constraints (the number of matings, the maximal number of parents, the minimal/equal/maximal number of contributions per parent, or allowance for selfing) that are gender specific or generic. The optimization can maximize genetic gain, minimize group coancestry, minimize inbreeding of individual matings, or maximize genetic gain for a given increase in group coancestry or inbreeding. Users provide a list of candidate individuals with associated gender and selection criteria information (if applicable) and coancestry matrix. Selection criteria and coancestry matrix can be based on pedigree or genome-wide markers. Additional individual or mating specific information can be included to enrich optimization objectives. An example of rapid recurrent genomic selection in wheat demonstrates how AlphaMate can double the efficiency of converting genetic diversity into genetic gain compared to truncation selection. Another example demonstrates the use of genome editing to expand the gain-diversity frontier.
Availability and implementation
Executable versions of AlphaMate for Windows, Mac and Linux platforms are available at http://www.AlphaGenes.roslin.ed.ac.uk/AlphaMate.
The limited genetic diversity of elite maize germplasms raises concerns about the potential to breed for new challenges. Initiatives have been formed over the years to identify and utilize useful ...diversity from landraces to overcome this issue. The aim of this study was to evaluate the proposed designs to initiate a pre-breeding program within the Seeds of Discovery (SeeD) initiative with emphasis on harnessing polygenic variation from landraces using genomic selection. We evaluated these designs with stochastic simulation to provide decision support about the effect of several design factors on the quality of resulting (pre-bridging) germplasm. The evaluated design factors were: i) the approach to initiate a pre-breeding program from the selected landraces, doubled haploids of the selected landraces, or testcrosses of the elite hybrid and selected landraces, ii) the genetic parameters of landraces and phenotypes, and iii) logistical factors related to the size and management of a pre-breeding program.
The results suggest a pre-breeding program should be initiated directly from landraces. Initiating from testcrosses leads to a rapid reconstruction of the elite donor genome during further improvement of the pre-bridging germplasm. The analysis of accuracy of genomic predictions across the various design factors indicate the power of genomic selection for pre-breeding programs with large genetic diversity and constrained resources for data recording. The joint effect of design factors was summarized with decision trees with easy to follow guidelines to optimize pre-breeding efforts of SeeD and similar initiatives.
Results of this study provide guidelines for SeeD and similar initiatives on how to initiate pre-breeding programs that aim to harness polygenic variation from landraces.
Invasive species are among the major driving forces behind biodiversity loss. Gene drive technology may offer a humane, efficient and cost-effective method of control. For safe and effective ...deployment it is vital that a gene drive is both self-limiting and can overcome evolutionary resistance. We present HD-ClvR in this modelling study, a novel combination of CRISPR-based gene drives that eliminates resistance and localises spread. As a case study, we model HD-ClvR in the grey squirrel (Sciurus carolinensis), which is an invasive pest in the UK and responsible for both biodiversity and economic losses. HD-ClvR combats resistance allele formation by combining a homing gene drive with a cleave-and-rescue gene drive. The inclusion of a self-limiting daisyfield gene drive allows for controllable localisation based on animal supplementation. We use both randomly mating and spatial models to simulate this strategy. Our findings show that HD-ClvR could effectively control a targeted grey squirrel population, with little risk to other populations. HD-ClvR offers an efficient, self-limiting and controllable gene drive for managing invasive pests.
This paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection ...candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal available resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario collected 11 phenotypic records per lactation. In genomic selection scenarios, we reduced phenotyping to between 10 and 1 phenotypic records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional selection scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic selection scenarios expectedly increased accuracy for young non-phenotyped candidate males and females, but also proven females. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximize return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.
ABSTRACT
Genomic selection offers great potential for increasing the rate of genetic improvement in plant breeding programs. This research used simulation to evaluate the effectiveness of different ...strategies for genotyping and phenotyping to enable genomic selection in early generation individuals (e.g., F2) in breeding programs involving biparental or similar (e.g., backcross or top cross) populations. By using phenotypes that were previously collected in other biparental populations, selection decisions could be made without waiting for phenotypes that pertain directly to the selection candidate to be collected, a process that would take at least three growing seasons. If these phenotypes were collected in biparental populations that were closely related to the selection candidates, only a small number of markers (e.g., 200–500) and a small number of phenotypes (e.g., 1000) were needed to achieve effective accuracy of estimated breeding values. If these phenotypes were collected in biparental populations that were not closely related to the selection candidates, as many as 10,000 markers and 5000 to 20,000 phenotypes were needed. Increasing marker density beyond 10,000 markers did not show benefit and in some scenarios reduced the accuracy of prediction. This study provides a guide, enabling resource allocation to be optimized between genotyping and phenotyping investment dependent on the population under development.
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major ...obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
By entering the era of mega-scale genomics, we are facing many computational issues with standard genomic evaluation models due to their dense data structure and cubic computational complexity. ...Several scalable approaches have been proposed to address this challenge, such as the Algorithm for Proven and Young (APY). In APY, genotyped animals are partitioned into core and non-core subsets, which induces a sparser inverse of the genomic relationship matrix. This partitioning is often done at random. While APY is a good approximation of the full model, random partitioning can make results unstable, possibly affecting accuracy or even reranking animals. Here we present a stable optimisation of the core subset by choosing animals with the most informative genotype data.
We derived a novel algorithm for optimising the core subset based on a conditional genomic relationship matrix or a conditional single nucleotide polymorphism (SNP) genotype matrix. We compared the accuracy of genomic predictions with different core subsets for simulated and real pig data sets. The core subsets were constructed (1) at random, (2) based on the diagonal of the genomic relationship matrix, (3) at random with weights from (2), or (4) based on the novel conditional algorithm. To understand the different core subset constructions, we visualise the population structure of the genotyped animals with linear Principal Component Analysis and non-linear Uniform Manifold Approximation and Projection.
All core subset constructions performed equally well when the number of core animals captured most of the variation in the genomic relationships, both in simulated and real data sets. When the number of core animals was not sufficiently large, there was substantial variability in the results with the random construction but no variability with the conditional construction. Visualisation of the population structure and chosen core animals showed that the conditional construction spreads core animals across the whole domain of genotyped animals in a repeatable manner.
Our results confirm that the size of the core subset in APY is critical. Furthermore, the results show that the core subset can be optimised with the conditional algorithm that achieves an optimal and repeatable spread of core animals across the domain of genotyped animals.
Social insects are very successful invasive species, and the continued increase of global trade and transportation has exacerbated this problem. The yellow-legged hornet, Vespa velutina nigrithorax ...(henceforth Asian hornet), is drastically expanding its range in Western Europe. As an apex insect predator, this hornet poses a serious threat to the honey bee industry and endemic pollinators. Current suppression methods have proven too inefficient and expensive to limit its spread. Gene drives might be an effective tool to control this species, but their use has not yet been thoroughly investigated in social insects. Here, we built a model that matches the hornet's life history and modelled the effect of different gene drive scenarios on an established invasive population. To test the broader applicability and sensitivity of the model, we also incorporated the invasive European paper wasp Polistes dominula. We find that, due to the haplodiploidy of social hymenopterans, only a gene drive targeting female fertility is promising for population control. Our results show that although a gene drive can suppress a social wasp population, it can only do so under fairly stringent gene drive-specific conditions. This is due to a combination of two factors: first, the large number of surviving offspring that social wasp colonies produce make it possible that, even with very limited formation of resistance alleles, such alleles can quickly spread and rescue the population. Second, due to social wasp life history, infertile individuals do not compete with fertile ones, allowing fertile individuals to maintain a large population size even when drive alleles are widespread. Nevertheless, continued improvements in gene drive technology may make it a promising method for the control of invasive social insects in the future.
Abstract
This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range ...of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software.
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous ...so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis.