Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and ...genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding.
In recent years, the global climate has changed, resulting in drastic fluctuations in rainfall patterns and increasing temperature. Sudden climate changes can cause significant economic losses to countries worldwide.
Genetic improvement of several economically important crops during the 20th century using phenotypic, pedigree, and performance data was very successful. However, signs of grain yield stagnation in some crops, especially in drought-stressed and semi-arid regions, are evident.
Genomic selection offers the opportunity to increase grain production in less time. International Maize and Wheat Improvement Center (CIMMYT) maize breeding research in Sub-Saharan Africa, India, and Mexico has shown that genomic selection can reduce the breeding interval cycle to at least half the conventional time and produces lines that, in hybrid combinations, significantly increase grain yield performance over that of commercial checks.
Public and private investment in crop genomic selection research should increase to successfully develop in less time germplasm that is adapted to sudden climate change.
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear ...(Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single‐environment analyses and extended to account for G × E interaction (GBLUP‐G × E, RKHS KA‐G × E and RKHS EB‐G × E) in wheat (Triticum aestivum L.) and maize (Zea mays L.) data sets. For single‐environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA‐G × E and RKHS EB‐G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP‐G × E. For the maize data set, the prediction accuracy of RKHS EB‐G × E and RKHS KA‐G × E was, on average, 5 to 6% higher than that of GBLUP‐G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker‐specific interaction effects.
Selecting and mating parents in conventional phenotypic and genomic selection are crucial. Plant breeding programs aim to improve the economic value of crops, considering multiple traits ...simultaneously. When traits are negatively correlated and/or when there are missing records in some traits, selection becomes more complex. To address this problem, we propose a multitrait selection approach using the Multitrait Parental Selection (MPS) R package—an efficient tool for genetic improvement, precision breeding, and conservation genetics. The package employs Bayesian optimization algorithms and three loss functions (Kullback–Leibler, Energy Score, and Multivariate Asymmetric Loss) to identify parental candidates with desirable traits. The software's functionality includes three main functions—EvalMPS, FastMPS, and ApproxMPS—catering to different data availability scenarios. Through the presented application examples, the MPS R package proves effective in multitrait genomic selection, enabling breeders to make informed decisions and achieve strong performance across multiple traits.
Core Ideas
The Multitrait Parental Selection (MPS) R package aids plant and animal breeders in addressing multitrait parental selection in both phenotypic and genomic selection.
MPS incorporates Bayesian optimization algorithms into genomic selection by employing various loss functions.
The results generated by MPS can be utilized to identify the most promising parental candidates based on genetic gain and diversity.
The MPS R package seamlessly integrates with the widely used Bayesian Generalized Linear Regression (BGLR) software.
Plain Language Summary
Selecting and mating parents in genomic selection are crucial in plant and animal breeding programs. During selection, researchers need to identify superior individuals to be parents considering multiple traits simultaneously, some of which act antagonistically; therefore, selection becomes complex. In this paper, we present an R package named MPS (Multitrait Parental Selection) to facilitate the selection process. MPS uses Bayesian optimization to identify superior individuals. Through the presented application examples, the MPS R package proves effective in multitrait genomic selection, enabling breeders to make informed decisions and achieve strong performance across multiple traits.
In agriculture and plant breeding, multienvironment trials over multiple years are conducted to evaluate and predict genotypic performance under different environmental conditions and to analyze, ...study, and interpret genotype × environment interaction (G × E). In this study, we propose a hierarchical Bayesian formulation of a linear–bilinear model, where the conditional conjugate prior for the bilinear (multiplicative) G × E term is the matrix von Mises–Fisher (mVMF) distribution (with environments and sites defined as synonymous). A hierarchical normal structure is assumed for linear effects of sites, and priors for precision parameters are assumed to follow gamma distributions. Bivariate highest posterior density (HPD) regions for the posterior multiplicative components of the interaction are shown within the usual biplots. Simulated and real maize (Zea mays L.) breeding multisite data sets were analyzed. Results showed that the proposed model facilitates identifying groups of genotypes and sites that cause G × E across years and within years, since the hierarchical Bayesian structure allows using plant breeding data from different years by borrowing information among them. This model offers the researcher valuable information about G × E patterns not only for each 1‐yr period of the breeding trials but also for the general process that originates the response across these periods.
Key message A new genomic model that incorporates genotype x environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent ...dry matter content and silage yield of maize hybrids. The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype x environment interaction (G x E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G x E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G x E in the prediction of untested maize hybrids increases the accuracy of genomic models.
One of the most widely used kernel functions in genomic-enabled prediction is the Gaussian kernel. Selection of the bandwidth parameter for kernel regression has generally been based on ...cross-validation. We propose a Bayesian method for estimating the bandwidth parameter h of a Gaussian kernel as the modal component of the joint posterior distribution of h and the form parameter Formula: see text. We present a theory for the Bayesian selection of h in a Transformed Gaussian Kernel (TGK) model and its application in two plant breeding datasets (maize and wheat) that were already predicted using the kernel averaging (KA) model in the context of Reproducing Kernel Hilbert Spaces (RKHS KA). We also compared the prediction accuracy of the proposed method with a model that also uses a Gaussian kernel and estimates the bandwidth parameter using a restricted maximum likelihood method (GK REML). Results for the wheat dataset show that the predictive ability of TGK was at least as good as the predictive ability of model RKHS KA, with TGK showing a significantly smaller Predictive Mean Squared Error (PMSE) than the other two approaches. The TGK model was statistically a better predictor than methods GK REML and RKHS KA in terms of mean PMSE and mean correlations in seven (out of 17) trait-environment combinations in the wheat dataset. Fewer differences were found between models for the maize data; the TGK model generally had similar or inferior prediction accuracy than GK REML and RKHS KA in various analyses. The superiority of GK REML over TGK based on mean PMSE was clear in seven maize traits.
We evaluated the behavior and skin temperature of dual-purpose cattle that grazed pastures having high (HC), low (LC), and no (NC) tree cover during the rainy and dry seasons in the hot and humid ...tropics of Mexico. We observed twenty-four adult cows (eight per treatment) over 24 days during each season, recording skin temperature and the time related to different daily activities. Across treatments, cows spent the same amount of time foraging during the rainy season (
P
> 0.05), but cows under HC spent less time during the dry season (
P
< 0.0001). During the rainy season, cows under HC showed more motivation to continue grazing than becoming restless or beginning rumination (
P
< 0.001) or roaming more than in other treatments (
P
< 0.001). During the dry season, cows under HC and LC also had less probability of initiating rest than NC (
P
< 0.001). Cows under HC had greater motivation to transition from grazing to roaming and less incentive to pass from rumination to rest than cows under LC and NC (
P
< 0.001). The frequency of water consumption was greater during the dry season (
P
< 0.001) and consistently high under NC (
P
< 0.0001). Skin temperature did not differ among treatments during the rainy season (
P
= 0.261), but during the dry season, it was greater under NC (
P
< 0.001). Tree cover improves cow behaviors by increasing the impetus to graze and perform daily activities, which contributes to reduced skin temperature during hotter seasons.
Genomic selection (GS) is a technology used for genetic improvement, and it has many advantages over phenotype-based selection. There are several statistical models that adequately approach the ...statistical challenges in GS, such as in linear mixed models (LMMs). An active area of research is the development of software for fitting LMMs mainly used to make genome-based predictions. The lme4 is the standard package for fitting linear and generalized LMMs in the R-package, but its use for genetic analysis is limited because it does not allow the correlation between individuals or groups of individuals to be defined. This article describes the new lme4GS package for R, which is focused on fitting LMMs with covariance structures defined by the user, bandwidth selection, and genomic prediction. The new package is focused on genomic prediction of the models used in GS and can fit LMMs using different variance–covariance matrices. Several examples of GS models are presented using this package as well as the analysis using real data.
As a result of the technological progress, the use of sensors for crop survey has substantially increased, generating valuable information for modelling agricultural data. Plant spectroscopy jointly ...with statistical modeling can potentially help to assess certain chemical components of interest present in plants, which may be laborious and expensive to obtain by direct measurements. In this research, the phosphorus content in wheat grain is modeled using reflectance information measured by a hyperspectral sensor at different wavelengths. A Bayesian procedure for selecting variables was used to identify the set of the most important spectral bands. Additionally, three different models were evaluated: the first model assumes that the observations are independent, the other two models assume that the observations are spatially correlated: one of the proposed models, assumes spatial dependence using a Conditionally Autoregressive Model (CAR), and the other through an exponential correlogram. The goodness of fit of the models was evaluated by means of the Deviance Information Criterion, and the predictive power is evaluated using cross validation.
We have found that CAR was the model that best fits and predicts the data. Additionally, the selection variable procedure in the CAR model reveals which wavelengths in the range of 500-690 nm are the most important. Comparing the vegetative indices with the CAR model, it was observed that the average correlation of the CAR model exceeded that of the vegetative indices by 23.26%, - 1.2% and 22.78% for the year 2010, 2011 and 2012 respectively; therefore, the use of the proposed methodology outperformed the vegetative indices in prediction.
The proposal to predict the phosphorus content in wheat grain using Bayesian approach, reflect with the results as a good alternative.
Ridge regression dealswith collinearity in the homoscedastic linear regression model. When the number of predictors (p) is much larger than the number of observations (n), it gives unique ...least-square estimators. From both, classical and Bayesian approaches, parameter estimation is a highly demanding computational task, in the first one being an optimization problem and in the second one a high-dimensional integration problem usually faced up through Markov chain Monte Carlo (MCMC). The main drawback of MCMC is the practical impossibility of checking convergence to the posterior distribution, which is commonly very slow due to the large number of regression parameters. Here, a computational algorithm is proposed to obtain posterior estimates of regression parameters, variance components and predictions for the conventional ridge regression model. The algorithm is based on a reparametrization of the model which allows us to obtain the marginal posterior means and variances by integrating out a nuisance parameter whose marginal posterior is defined on the open interval
.