Advancements in next-generation sequencing technology have enabled whole genome re-sequencing in many species providing unprecedented discovery and characterization of molecular polymorphisms. There ...are limitations, however, to next-generation sequencing approaches for species with large complex genomes such as barley and wheat. Genotyping-by-sequencing (GBS) has been developed as a tool for association studies and genomics-assisted breeding in a range of species including those with complex genomes. GBS uses restriction enzymes for targeted complexity reduction followed by multiplex sequencing to produce high-quality polymorphism data at a relatively low per sample cost. Here we present a GBS approach for species that currently lack a reference genome sequence. We developed a novel two-enzyme GBS protocol and genotyped bi-parental barley and wheat populations to develop a genetically anchored reference map of identified SNPs and tags. We were able to map over 34,000 SNPs and 240,000 tags onto the Oregon Wolfe Barley reference map, and 20,000 SNPs and 367,000 tags on the Synthetic W9784×Opata85 (SynOpDH) wheat reference map. To further evaluate GBS in wheat, we also constructed a de novo genetic map using only SNP markers from the GBS data. The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence. Development of the sequenced reference genome(s) will in turn increase the utility of GBS data enabling physical mapping of genes and haplotype imputation of missing data. Finally, as a result of low per-sample costs, GBS will have broad application in genomics-assisted plant breeding programs.
Over the past decade, genomics-assisted breeding (GAB) has been instrumental in harnessing the potential of modern genome resources and characterizing and exploiting allelic variation for germplasm ...enhancement and cultivar development. Sustaining GAB in the future (GAB 2.0) will rely upon a suite of new approaches that fast-track targeted manipulation of allelic variation for creating novel diversity and facilitate their rapid and efficient incorporation in crop improvement programs. Genomic breeding strategies that optimize crop genomes with accumulation of beneficial alleles and purging of deleterious alleles will be indispensable for designing future crops. In coming decades, GAB 2.0 is expected to play a crucial role in breeding more climate-smart crop cultivars with higher nutritional value in a cost-effective and timely manner.
Availability of reference genomes and genome-wide surveys on comprehensive diversity panels pave the way to associate the allelic variation with phenotypes.Methods are now available to evaluate the genetic worth of the vast genetic resources archived in gene banks and streamline application of these resources in crop improvement programs.Precise genome editing technologies in concert with enhanced trait architectures enable innovative solutions to engineer complex trait variation.High-throughput phenotyping methods are beginning to alleviate the challenge of accurate, precise, and large-scale measurements of plant performance.Optimized speed breeding protocols remain crucial to accelerating breeding advance when applied with genomic breeding approaches.Sustaining gains from genomic breeding seeks fast-tracking exploitation of the minor effect alleles, accumulation of favorable alleles, and purging of deleterious alleles.
Genome-wide molecular markers are often being used to evaluate genetic diversity in germplasm collections and for making genomic selections in breeding programs. To accurately predict phenotypes and ...assay genetic diversity, molecular markers should assay a representative sample of the polymorphisms in the population under study. Ascertainment bias arises when marker data is not obtained from a random sample of the polymorphisms in the population of interest. Genotyping-by-sequencing (GBS) is rapidly emerging as a low-cost genotyping platform, even for the large, complex, and polyploid wheat (Triticum aestivum L.) genome. With GBS, marker discovery and genotyping occur simultaneously, resulting in minimal ascertainment bias. The previous platform of choice for whole-genome genotyping in many species such as wheat was DArT (Diversity Array Technology) and has formed the basis of most of our knowledge about cereals genetic diversity. This study compared GBS and DArT marker platforms for measuring genetic diversity and genomic selection (GS) accuracy in elite U.S. soft winter wheat. From a set of 365 breeding lines, 38,412 single nucleotide polymorphism GBS markers were discovered and genotyped. The GBS SNPs gave a higher GS accuracy than 1,544 DArT markers on the same lines, despite 43.9% missing data. Using a bootstrap approach, we observed significantly more clustering of markers and ascertainment bias with DArT relative to GBS. The minor allele frequency distribution of GBS markers had a deficit of rare variants compared to DArT markers. Despite the ascertainment bias of the DArT markers, GS accuracy for three traits out of four was not significantly different when an equal number of markers were used for each platform. This suggests that the gain in accuracy observed using GBS compared to DArT markers was mainly due to a large increase in the number of markers available for the analysis.
To introduce new genetic diversity into the bread wheat gene pool from its progenitor, Aegilops tauschii (Coss.) Schmalh, 33 primary synthetic hexaploid wheat genotypes (SYN) were crossed to 20 ...spring bread wheat (BW) cultivars at the International Wheat and Maize Improvement Center. Modified single seed descent was used to develop 97 populations with 50 individuals per population using first back-cross, biparental, and three-way crosses. Individuals from each cross were selected for short stature, early heading, flowering and maturity, minimal lodging, and free threshing. Yield trials were conducted under irrigated, drought, and heat-stress conditions from 2011 to 2014 in Ciudad Obregon, Mexico. Genomic estimated breeding values (GEBVs) of parents and synthetic derived lines (SDLs) were estimated using a genomic best linear unbiased prediction (GBLUP) model with markers in each trial. In each environment, there were SDLs that had higher GEBVs than their recurrent BW parent for yield. The GEBVs of BW parents for yield ranged from -0.32 in heat to 1.40 in irrigated trials. The range of the SYN parent GEBVs for yield was from -2.69 in the irrigated to 0.26 in the heat trials and were mostly negative across environments. The contribution of the SYN parents to improved grain yield of the SDLs was highest under heat stress, with an average GEBV for the top 10% of the SDLs of 0.55 while the weighted average GEBV of their corresponding recurrent BW parents was 0.26. Using the pedigree-based model, the accuracy of genomic prediction for yield was 0.42, 0.43, and 0.49 in the drought, heat and irrigated trials, respectively, while for the marker-based model these values were 0.43, 0.44, and 0.55. The SYN parents introduced novel diversity into the wheat gene pool. Higher GEBVs of progenies were due to introgression and retention of some positive alleles from SYN parents.
KEY MESSAGE : Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a ...large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1 % on average and the variability in prediction accuracy decreased by 10.8 %. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
Association mapping is a method for detection of gene effects based on linkage disequilibrium (LD) that complements QTL analysis in the development of tools for molecular plant breeding. In this ...study, association mapping was performed on a selected sample of 95 cultivars of soft winter wheat. Population structure was estimated on the basis of 36 unlinked simple-sequence repeat (SSR) markers. The extent of LD was estimated on chromosomes 2D and part of 5A, relative to the LD observed among unlinked markers. Consistent LD on chromosome 2D was <1 cM, whereas in the centromeric region of 5A, LD extended for approximately 5 cM. Association of 62 SSR loci on chromosomes 2D, 5A, and 5B with kernel morphology and milling quality was analyzed through a mixed-effects model, where subpopulation was considered as a random factor and the marker tested was considered as a fixed factor. Permutations were used to adjust the threshold of significance for multiple testing within chromosomes. In agreement with previous QTL analysis, significant markers for kernel size were detected on the three chromosomes tested, and alleles potentially useful for selection were identified. Our results demonstrated that association mapping could complement and enhance previous QTL information for marker-assisted selection.
ABSTRACT
Genomic selection (GS) has created a lot of excitement and expectations in the animal‐ and plant‐breeding research communities. In this review, we briefly describe how genomic prediction can ...be integrated into breeding efforts and point out achievements and areas where more research is needed. Genomic selection provides many opportunities to increase genetic gain in plant breeding per unit time and cost. Early empirical and simulation results are promising, but for GS to deliver genetic gains, careful consideration of the problem of optimal resource allocation is needed. Consideration of the cost‐benefit balance of using markers for each trait and stage of the breeding cycle is needed, moving beyond only focusing on recurrent selection with GS on a few complex traits, using prediction on unphenotyped individuals. With decreasing marker cost, phenotype data is quickly becoming the most valuable asset and marker‐assisted selection strategies should focus on making the most of scarce and expensive phenotypes. It is important to realize that markers can also improve accuracy of selection for phenotyped individuals. Use of markers as an aid to phenotype analysis suggests a number of new strategies in terms of experimental design and multi‐trait models. GS also provides new ways to analyze and deal with genotype by environment interactions. Lastly, we point to some recent results showing that new models are needed to improve predictions particularly with respect to the use of distantly related individuals in the training population.
Crossovers (COs), that drive genetic exchange between homologous chromosomes, are strongly biased toward subtelomeric regions in plant species. Manipulating the rate and positions of COs to increase ...the genetic variation accessible to breeders is a longstanding goal. Use of genome editing reagents that induce double-stranded breaks (DSBs) or modify the epigenome at desired sites of recombination, and manipulation of CO factors, are increasingly applicable approaches for achieving this goal. These strategies for ‘controlled recombination’ have potential to reduce the time and expense associated with traditional breeding, reveal currently inaccessible genetic diversity, and increase control over the inheritance of preferred haplotypes. Considerable challenges to address include translating knowledge from models to crop species and determining the best stages of the breeding cycle at which to control recombination.
The genetic diversity accessible to plant breeders has traditionally been limited by chromosomal COs, but recent advances in targeted DNA cleavage and epigenetic modification are increasing access.Overcoming the low frequency and uneven distribution of COs in plants can reveal allelic diversity and may increase control over the inheritance of preferred haplotypes.The frequency and location of COs can be altered with manipulation of pro- and anti-CO factors, site-directed nucleases, or epigenetic modifiers; we refer to such alteration as ‘controlled recombination’.Epigenetic modifiers can induce COs near centromeres, which are otherwise very low-frequency CO regions.Controlled recombination may enable breeders and geneticists to unlock otherwise inaccessible genetic diversity.
KEY MESSAGE : Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal ...performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
ABSTRACT
Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid genetic gains. However, with the increased popularity of GS approaches, numerous ...models have been proposed and no comparative analysis is available to identify the most promising ones. Using eight wheat (Triticum aestivum L.), barley (Hordeum vulgare L.), Arabidopsis thaliana (L.) Heynh., and maize (Zea mays L.) datasets, the predictive ability of currently available GS models along with several machine learning methods was evaluated by comparing accuracies, the genomic estimated breeding values (GEBVs), and the marker effects for each model. While a similar level of accuracy was observed for many models, the level of overfitting varied widely as did the computation time and the distribution of marker effect estimates. Our comparisons suggested that GS in plant breeding programs could be based on a reduced set of models such as the Bayesian Lasso, weighted Bayesian shrinkage regression (wBSR, a fast version of BayesB), and random forest (RF) (a machine learning method that could capture nonadditive effects). Linear combinations of different models were tested as well as bagging and boosting methods, but they did not improve accuracy. This study also showed large differences in accuracy between subpopulations within a dataset that could not always be explained by differences in phenotypic variance and size. The broad diversity of empirical datasets tested here adds evidence that GS could increase genetic gain per unit of time and cost.