Modern whole-genome prediction (WGP) frameworks that focus on multi-environment trials (MET) integrate large-scale genomics, phenomics, and envirotyping data. However, the more complex the ...statistical model, the longer the computational processing times, which do not always result in accuracy gains. We investigated the use of new kernel methods and modeling structures involving genomics and nongenomic sources of variation in two MET maize data sets. Five WGP models were considered, advancing in complexity from a main-effect additive model (A) to more complex structures, including dominance deviations (D), genotype × environment interaction (AE and DE), and the reaction-norm model using environmental covariables (W) and their interaction with A and D (AW + DW). A combination of those models built with three different kernel methods, Gaussian kernel (GK), Deep kernel (DK), and the benchmark genomic best linear-unbiased predictor (GBLUP/GB), was tested under three prediction scenarios: newly developed hybrids (CV1), sparse MET conditions (CV2), and new environments (CV0). GK and DK outperformed GB in prediction accuracy and reduction of computation time (~up to 20%) under all model-kernel scenarios. GK was more efficient in capturing the variation due to A + AE and D + DE effects and translated it into accuracy gains (~up to 85% compared with GB). DK provided more consistent predictions, even for more complex structures such as W + AW + DW. Our results suggest that DK and GK are more efficient in translating model complexity into accuracy, and more suitable for including dominance and reaction-norm effects in a biologically accurate and faster way.
This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. ...It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool.
The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. ...This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.
In most crops, genetic and environmental factors interact in complex ways giving rise to substantial genotype-by-environment interactions (G×E). We propose that computer simulations leveraging field ...trial data, DNA sequences, and historical weather records can be used to tackle the longstanding problem of predicting cultivars' future performances under largely uncertain weather conditions. We present a computer simulation platform that uses Monte Carlo methods to integrate uncertainty about future weather conditions and model parameters. We use extensive experimental wheat yield data (n = 25,841) to learn G×E patterns and validate, using left-trial-out cross-validation, the predictive performance of the model. Subsequently, we use the fitted model to generate circa 143 million grain yield data points for 28 wheat genotypes in 16 locations in France, over 16 years of historical weather records. The phenotypes generated by the simulation platform have multiple downstream uses; we illustrate this by predicting the distribution of expected yield at 448 cultivar-location combinations and performing means-stability analyses.
This open access book focuses on the linear selection index (LSI) theory and its statistical properties. It addresses the single-stage LSI theory by assuming that economic weights are fixed and known ...- or fixed, but unknown - to predict the net genetic merit in the phenotypic, marker and genomic context. Further, it shows how to combine the LSI theory with the independent culling method to develop the multistage selection index theory. The final two chapters present simulation results and SAS and R codes, respectively, to estimate the parameters and make selections using some of the LSIs described. It is essential reading for plant quantitative geneticists, but is also a valuable resource for animal breeders.
The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with ...phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available.
Linkage disequilibrium can be used for identifying associations between traits of interest and genetic markers. This study used mapped diversity array technology (DArT) markers to find associations ...with resistance to stem rust, leaf rust, yellow rust, and powdery mildew, plus grain yield in five historical wheat international multienvironment trials from the International Maize and Wheat Improvement Center (CIMMYT). Two linear mixed models were used to assess marker-trait associations incorporating information on population structure and covariance between relatives. An integrated map containing 813 DArT markers and 831 other markers was constructed. Several linkage disequilibrium clusters bearing multiple host plant resistance genes were found. Most of the associated markers were found in genomic regions where previous reports had found genes or quantitative trait loci (QTL) influencing the same traits, providing an independent validation of this approach. In addition, many new chromosome regions for disease resistance and grain yield were identified in the wheat genome. Phenotyping across up to 60 environments and years allowed modeling of genotype x environment interaction, thereby making possible the identification of markers contributing to both additive and additive x additive interaction effects of traits.
Realistic experimental protocols to screen for drought adaptation in controlled conditions are crucial if high throughput phenotyping is to be used for the identification of high performance lines, ...and is especially important in the evaluation of transgenes where stringent biosecurity measures restrict the frequency of open field trials. Transgenic DREB1A-wheat events were selected under greenhouse conditions by evaluating survival and recovery under severe drought (SURV) as well as for water use efficiency (WUE). Greenhouse experiments confirmed the advantages of transgenic events in recovery after severe water stress. Under field conditions, the group of transgenic lines did not generally outperform the controls in terms of grain yield under water deficit. However, the events selected for WUE were identified as lines that combine an acceptable yield—even higher yield (WUE-11) under well irrigated conditions—and stable performance across the different environments generated by the experimental treatments.
Key message
Using phenotype data of three spring wheat populations evaluated at 6–15 environments under two management systems, we found moderate to very high prediction accuracies across seven ...traits. The phenotype data collected under an organic management system effectively predicted the performance of lines in the conventional management and vice versa.
There is growing interest in developing wheat cultivars specifically for organic agriculture, but we are not aware of the effect of organic management on the predictive ability of genomic selection (GS). Here, we evaluated within populations prediction accuracies of four GS models, four combinations of training and testing sets, three reaction norm models, and three random cross-validations (CV) schemes in three populations phenotyped under organic and conventional management systems. Our study was based on a total of 578 recombinant inbred lines and varieties from three spring wheat populations, which were evaluated for seven traits at 3–9 conventionally and 3–6 organically managed field environments and genotyped either with the wheat 90 K SNP array or DArTseq. We predicted the management systems (CV0
M
) or environments (CV0), a subset of lines that have been evaluated in either management (CV2
M
) or some environments (CV2), and the performance of newly developed lines in either management (CV1
M
) or environments (CV1). The average prediction accuracies of the model that incorporated genotype × environment interactions with CV0 and CV2 schemes varied from 0.69 to 0.97. In the CV1 and CV1M schemes, prediction accuracies ranged from − 0.12 to 0.77 depending on the reaction norm models, the traits, and populations. In most cases, grain protein showed the highest prediction accuracies. The phenotype data collected under the organic management effectively predicted the performance of lines under conventional management and vice versa. This is the first comprehensive GS study that investigated the effect of the organic management system in wheat.