Current status of genomic evaluation Misztal, Ignacy; Lourenco, Daniela; Legarra, Andres
Journal of animal science,
04/2020, Volume:
98, Issue:
4
Journal Article
Peer reviewed
Open access
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. ...Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
•Although genomic selection increased response in the high heritability trait, a negative or a low response was observed for the low heritability trait in the cases of negative and zero genetic ...correlations.•The GSI compared to genomic selection yielded higher or equal response for aggregate genotype and low heritability trait.•Higher reduction in genetic variance was observed for scenarios that cause a bigger response to selection.•There were no differences in the accumulation of inbreeding between the different selection criteria.
Aggregate genotype in selection programs commonly includes multiple traits, and response to individual traits have importance in addition to their inclusion in the aggregate genotype. The availability of the effect of all markers on genomic selection is an opportunity to select animals based on the desired marker alleles that affect different traits. Genomic selection index (GSI) method divides markers into different groups based on their effects on different traits. It combines and weighs the genomic breeding value of each marker group into an overall index such that maximize response in the aggregate genotype. The primary purpose of this study was to evaluate the long-term effect of GSI compared to genomic selection on genetic progress of a two-trait aggregate genotype and its constituent traits using different Bayesian methods. The results showed that when the correlation between traits was negative, the GSI method yielded higher gain than the genomic selection for the low heritable trait and the aggregate genotype. However, the GSI superiority was decreased as the correlation between the traits increased and a similar response to the genomic selection was obtained for the correlation of 0.5. A higher reduction in genetic variance was observed for scenarios that cause a more significant response to selection. The inbreeding rate was relatively low in all scenarios. The results of this study suggest the use of GSI rather than genomic selection, especially when traits with low heritability are present in the aggregate genotype, and have a negative correlation with other high heritability traits.
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on ...genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Plant breeding faces great challenges posed by the flood of big data and the need for improved genetic gain under climate change. Genomic selection can play a more important role in plant breeding by developing more efficient prediction models. Plant breeding will be advanced with the support of big data, artificial intelligence (machine and deep learning), and integrated genomic-enviromic prediction (selection).
Marker-assisted selection (MAS) refers to the use of molecular markers to assist phenotypic selections in crop improvement. Several types of molecular markers, such as single nucleotide polymorphism ...(SNP), have been identified and effectively used in plant breeding. The application of next-generation sequencing (NGS) technologies has led to remarkable advances in whole genome sequencing, which provides ultra-throughput sequences to revolutionize plant genotyping and breeding. To further broaden NGS usages to large crop genomes such as maize and wheat, genotyping-by-sequencing (GBS) has been developed and applied in sequencing multiplexed samples that combine molecular marker discovery and genotyping. GBS is a novel application of NGS protocols for discovering and genotyping SNPs in crop genomes and populations. The GBS approach includes the digestion of genomic DNA with restriction enzymes followed by the ligation of barcode adapter, PCR amplification and sequencing of the amplified DNA pool on a single lane of flow cells. Bioinformatic pipelines are needed to analyze and interpret GBS datasets. As an ultimate MAS tool and a cost-effective technique, GBS has been successfully used in implementing genome-wide association study (GWAS), genomic diversity study, genetic linkage analysis, molecular marker discovery and genomic selection under a large scale of plant breeding programs.
Novel high-throughput phenotyping (HTP) approaches are needed to advance the understanding of genotype-to-phenotype and accelerate plant breeding. The first generation of HTP has examined simple ...spectral reflectance traits from images and sensors but is limited in advancing our understanding of crop development and architecture. Lodging is a complex trait that significantly impacts yield and quality in many crops including wheat. Conventional visual assessment methods for lodging are time-consuming, relatively low-throughput, and subjective, limiting phenotyping accuracy and population sizes in breeding and genetics studies. Here, we demonstrate the considerable power of unmanned aerial systems (UAS) or drone-based phenotyping as a high-throughput alternative to visual assessments for the complex phenological trait of lodging, which significantly impacts yield and quality in many crops including wheat. We tested and validated quantitative assessment of lodging on 2,640 wheat breeding plots over the course of 2 years using differential digital elevation models from UAS. High correlations of digital measures of lodging to visual estimates and equivalent broad-sense heritability demonstrate this approach is amenable for reproducible assessment of lodging in large breeding nurseries. Using these high-throughput measures to assess the underlying genetic architecture of lodging in wheat, we applied genome-wide association analysis and identified a key genomic region on chromosome 2A, consistent across digital and visual scores of lodging. However, these associations accounted for a very minor portion of the total phenotypic variance. We therefore investigated whole genome prediction models and found high prediction accuracies across populations and environments. This adequately accounted for the highly polygenic genetic architecture of numerous small effect loci, consistent with the previously described complex genetic architecture of lodging in wheat. Our study provides a proof-of-concept application of UAS-based phenomics that is scalable to tens-of-thousands of plots in breeding and genetic studies as will be needed to uncover the genetic factors and increase the rate of gain for complex traits in crop breeding.
Genomic selection (GS) is transforming the field of plant breeding and implementing models that improve prediction accuracy for complex traits is needed. Analytical methods for complex datasets ...traditionally used in other disciplines represent an opportunity for improving prediction accuracy in GS. Deep learning (DL) is a branch of machine learning (ML) which focuses on densely connected networks using artificial neural networks for training the models. The objective of this research was to evaluate the potential of DL models in the Washington State University spring wheat breeding program. We compared the performance of two DL algorithms, namely multilayer perceptron (MLP) and convolutional neural network (CNN), with ridge regression best linear unbiased predictor (rrBLUP), a commonly used GS model. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat nested association mapping (NAM) population planted from 2014-2016 growing seasons. We predicted five different quantitative traits with varying genetic architecture using cross-validations (CVs), independent validations, and different sets of SNP markers. Hyperparameters were optimized for DL models by lowering the root mean square in the training set, avoiding model overfitting using dropout and regularization. DL models gave 0 to 5% higher prediction accuracy than rrBLUP model under both cross and independent validations for all five traits used in this study. Furthermore, MLP produces 5% higher prediction accuracy than CNN for grain yield and grain protein content. Altogether, DL approaches obtained better prediction accuracy for each trait, and should be incorporated into a plant breeder's toolkit for use in large scale breeding programs.
Over the past three decades, Nile tilapia industry has grown into a significant aquaculture industry spread over 120 tropical and sub-tropical countries around the world accounting for 7.4% of global ...aquaculture production in 2015. Across species, genomic selection has been shown to increase predictive ability and genetic gain, also extending into aquaculture. Hence, the aim of this paper is to compare the predictive abilities of pedigree- and genomic-based models in univariate and multivariate approaches, with the aim to utilise genomic selection in a Nile tilapia breeding program. A total of 1444 fish were genotyped (48,960 SNP loci) and phenotyped for body weight at harvest (BW), fillet weight (FW) and fillet yield (FY). The pedigree-based analysis utilized a deep pedigree, including 14 generations. Estimated breeding values (EBVs and GEBVs) were obtained with traditional pedigree-based (PBLUP) and genomic (GBLUP) models, using both univariate and multivariate approaches. Prediction accuracy and bias were evaluated using 5 replicates of 10-fold cross-validation with three different cross-validation approaches. Further, impact of these models and approaches on the genetic evaluation was assessed based on the ranking of the selection candidates.
GBLUP univariate models were found to increase the prediction accuracy and reduce bias of prediction compared to other PBLUP and multivariate approaches. Relative to pedigree-based models, prediction accuracy increased by ~20% for FY, >75% for FW and >43% for BW. GBLUP models caused major re-ranking of the selection candidates. Within GBLUP models, there was no significant difference between rankings produced by univariate and multivariate approaches. The heritabilities using multivariate GBLUP models for BW, FW and FY were 0.19 ± 0.04, 0.17 ± 0.04 and 0.23 ± 0.04 respectively. BW showed very high genetic correlation with FW (0.96 ± 0.01) and a slightly negative genetic correlation with FY (−0.11 ± 0.15).
Predictive ability of genomic prediction models is substantially higher than for classical pedigree-based models. Genomic selection is therefore beneficial to the Nile tilapia breeding program, and it is recommended in routine genetic evaluations of commercial traits in the Nile tilapia breeding nucleus.
•First report comparing prediction accuracy using both univariate and multivariate GBLUP and PBLUP models in Nile tilapia.•Genomic selection improves prediction accuracy & is recommended in routine genetic evaluations in tilapia breeding program.•Multivariate approaches were not found to increase the prediction accuracy for commercial traits in Nile tilapia.
Abstract
Background
Genomic selection has the potential to accelerate genetic gain in perennial ryegrass breeding, provided complex traits such as forage yield can be predicted with sufficient ...accuracy.
Methods
In this study, we compared modelling approaches and feature selection strategies to evaluate the accuracy of genomic prediction models for seasonal forage yield production.
Results
Overall, model selection had limited impact on predictive ability when using the full data set. For a baseline genomic best linear unbiased prediction model, the highest mean predictive accuracy was obtained for spring grazing (0.78), summer grazing (0.62) and second cut silage (0.56). In terms of feature selection strategies, using uncorrelated single‐nucleotide polymorphisms (SNPs) had no impact on predictive ability, allowing for a potential decrease of the data set dimensions. With a genome‐wide association study, we found a significant SNP marker for spring grazing, located in the genic region annotated as coding for an enzyme responsible for fucosylation of xyloglucans—major components of the plant cell wall. We also presented an approach to increase interpretability of genomic prediction models with the use of Gene Ontology enrichment analysis.
Conclusions
Approaches for feature selection will be relevant in development of low‐cost genotyping platforms in support of routine and cost‐effective implementation of genomic selection.
While autozygosity as a consequence of selection is well understood, there is limited information on the ability of different methods to measure true inbreeding. In the present study, a gene dropping ...simulation was performed and inbreeding estimates based on runs of homozygosity (ROH), pedigree, and the genomic relationship matrix were compared to true inbreeding. Inbreeding based on ROH was estimated using SNP1101, PLINK, and BCFtools software with different threshold parameters. The effects of different selection methods on ROH patterns were also compared. Furthermore, inbreeding coefficients were estimated in a sample of genotyped North American Holstein animals born from 1990 to 2016 using 50 k chip data and ROH patterns were assessed before and after genomic selection.
Using ROH with a minimum window size of 20 to 50 using SNP1101 provided the closest estimates to true inbreeding in simulation study. Pedigree inbreeding tended to underestimate true inbreeding, and results for genomic inbreeding varied depending on assumptions about base allele frequencies. Using an ROH approach also made it possible to assess the effect of population structure and selection on distribution of runs of autozygosity across the genome. In the simulation, the longest individual ROH and the largest average length of ROH were observed when selection was based on best linear unbiased prediction (BLUP), whereas genomic selection showed the largest number of small ROH compared to BLUP estimated breeding values (BLUP-EBV). In North American Holsteins, the average number of ROH segments of 1 Mb or more per individual increased from 57 in 1990 to 82 in 2016. The rate of increase in the last 5 years was almost double that of previous 5 year periods. Genomic selection results in less autozygosity per generation, but more per year given the reduced generation interval.
This study shows that existing software based on the measurement of ROH can accurately identify autozygosity across the genome, provided appropriate threshold parameters are used. Our results show how different selection strategies affect the distribution of ROH, and how the distribution of ROH has changed in the North American dairy cattle population over the last 25 years.
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which ...involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.