KEY MESSAGE : We review and propose several methods for identifying possible outliers and evaluate their properties. The methods are applied to a genomic prediction program in hybrid rye. Many plant ...breeders use ANOVA-based software for routine analysis of field trials. These programs may offer specific in-built options for residual analysis that are lacking in current REML software. With the advance of molecular technologies, there is a need to switch to REML-based approaches, but without losing the good features of outlier detection methods that have proven useful in the past. Our aims were to compare the variance component estimates between ANOVA and REML approaches, to scrutinize the outlier detection method of the ANOVA-based package PlabStat and to propose and evaluate alternative procedures for outlier detection. We compared the outputs produced using ANOVA and REML approaches of four published datasets of generalized lattice designs. Five outlier detection methods are explained step by step. Their performance was evaluated by measuring the true positive rate and the false positive rate in a dataset with artificial outliers simulated in several scenarios. An implementation of genomic prediction using an empirical rye multi-environment trial was used to assess the outlier detection methods with respect to the predictive abilities of a mixed model for each method. We provide a detailed explanation of how the PlabStat outlier detection methodology can be translated to REML-based software together with the evaluation of alternative methods to identify outliers. The method combining the Bonferroni–Holm test to judge each residual and the residual standardization strategy of PlabStat exhibited good ability to detect outliers in small and large datasets and under a genomic prediction application. We recommend the use of outlier detection methods as a decision support in the routine data analyses of plant breeding experiments.
ABSTRACT
In vivo haploid induction has become a routine tool for rapid line development in maize (Zea mays L.). However, distinguishing haploid (H) from diploid crossing (C) seeds is problematic for ...many germplasms due to poor expression or suppression of the currently‐used R1‐nj embryo marker. We examined a new approach for sorting H and C seeds on the basis of their oil content (OC). Ten source germplasms, including single crosses, synthetics, and landraces, were pollinated by high‐oil (HO) inducer UH600 with OC = 10.8%. Identification of embryoless seeds based on seed OC < 2.1% was very reliable. The average difference (1.79%) between the mean OC of C and H seeds was more than twice the standard deviation (SD) within each fraction. Thus, sorting H and C seeds based on OC smaller or greater than an a priori chosen threshold t was generally more reliable than based on the R1‐nj embryo marker. Another ten source germplasms were pollinated with normal‐oil inducer UH400 with OC = 3.0%. Since the difference (0.65%) between OC of C and H seeds was approximately of the same magnitude as the SD, both fractions overlapped too much for reliable sorting. The discrimination of H and C seeds based on their OC looks very promising, even for heterogeneous source materials such as landraces, provided an HO inducer and a stringent threshold t are used. In combination with high‐throughput platforms for automated sorting of single seeds for OC, this opens new avenues for extending the application and increasing the efficiency of the double haploid technology in maize.
Key message
Selected doubled haploid lines averaged similar testcross performance as their original landraces, and the best of them approached the yields of elite inbreds, demonstrating their ...potential to broaden the narrow genetic diversity of the flint germplasm pool.
Maize landraces represent a rich source of genetic diversity that remains largely idle because the high genetic load and performance gap to elite germplasm hamper their use in modern breeding programs. Production of doubled haploid (DH) lines can mitigate problems associated with the use of landraces in pre-breeding. Our objective was to assess in comparison with modern materials the testcross performance (TP) of the best 89 out of 389 DH lines developed from six landraces and evaluated in previous studies for line per se performance (LP). TP with a dent tester was evaluated for the six original landraces, ~ 15 DH lines from each landrace selected for LP, and six elite flint inbreds together with nine commercial hybrids for grain and silage traits. Mean TP of the DH lines rarely differed significantly from TP of their corresponding landrace, which averaged in comparison with the mean TP of the elite flint inbreds ~ 20% lower grain yield and ~ 10% lower dry matter and methane yield. Trait correlations of DH lines closely agreed with the literature; correlation of TP with LP was zero for grain yield, underpinning the need to evaluate TP in addition to LP. For all traits, we observed substantial variation for TP among the DH lines and the best showed similar TP yields as the elite inbreds. Our results demonstrate the high potential of landraces for broadening the narrow genetic base of the flint heterotic pool and the usefulness of the DH technology for exploiting idle genetic resources from gene banks.
Key message
A breeding strategy combining genomic with one-stage phenotypic selection maximizes annual selection gain for net merit. Choice of the selection index strongly affects the selection gain ...expected in individual traits.
Selection indices using genomic information have been proposed in crop-specific scenarios. Routine use of genomic selection (GS) for simultaneous improvement of multiple traits requires information about the impact of the available economic and logistic resources and genetic properties (variances, trait correlations, and prediction accuracies) of the breeding population on the expected selection gain. We extended the R package “
selectiongain
” from single trait to index selection to optimize and compare breeding strategies for simultaneous improvement of two traits. We focused on the expected annual selection gain (
ΔG
a
)
for traits differing in their genetic correlation, economic weights, variance components, and prediction accuracies of GS. For all scenarios considered, breeding strategy
GSrapid
(one-stage GS followed by one-stage phenotypic selection) achieved higher Δ
G
a
than classical two-stage phenotypic selection, regardless of the index chosen to combine the two traits and the prediction accuracy of GS. The Smith–Hazel or base index delivered higher Δ
G
a
for net merit and individual traits compared to selection by independent culling levels, whereas the restricted index led to lower
ΔG
a
in net merit and divergent results for selection gain of individual traits. The differences among the indices depended strongly on the correlation of traits, their variance components, and economic weights, underpinning the importance of choosing the selection indices according to the goal of the breeding program. We demonstrate our theoretical derivations and extensions of the R package “
selectiongain
” with an example from hybrid wheat by designing indices to simultaneously improve grain yield and grain protein content or sedimentation volume.
For efficient production of doubled haploid (DH) lines in maize, maternal haploid inducer lines with high haploid induction rate (HIR) and good adaptation to the target environments is an important ...requirement. In this study, we present second-generation Tropically Adapted Inducer Lines (2GTAILs), developed using marker assisted selection (MAS) for
, a QTL with a significant positive effect on HIR from the crosses between elite tropical maize inbreds and first generation Tropically Adapted Inducers Lines (TAILs). Evaluation of 2GTAILs for HIR and agronomic performance in the tropical and subtropical environments indicated superior performance of 2GTAILs over the TAILs for both HIR and agronomic performance, including plant vigor, delayed flowering, grain yield, and resistance to ear rots. One of the new inducers 2GTAIL006 showed an average HIR of 13.1% which is 48.9% higher than the average HIR of the TAILs. Several other 2GTAILs also showed higher HIR compared to the TAILs. While employing MAS for
QTL, we observed significant influence of the non-inducer parent on the positive effect of
QTL on HIR. The non-inducer parents that resulted in highest mean HIR in the early generation qhir1+ families also gave rise to highest numbers of candidate inducers, some of which showed transgressive segregation for HIR. The mean HIR of early generation qhir1+ families involving different non-inducer parents can potentially indicate recipient non-inducer parents that can result in progenies with high HIR. Our study also indicated that the HIR associated traits (endosperm abortion rate, embryo abortion rate, and proportion of haploid plants among the inducer plants) can be used to differentiate inducers vs. non-inducers but are not suitable for differentiating inducers with varying levels of haploid induction rates. We propose here an efficient methodology for developing haploid inducer lines combining MAS for
with HIR associated traits.
Key message
Using landraces for broadening the genetic base of elite maize germplasm is hampered by heterogeneity and high genetic load. Production of DH line libraries can help to overcome these ...problems
.
Landraces of maize (
Zea mays
L.) represent a huge reservoir of genetic diversity largely untapped by breeders. Genetic heterogeneity and a high genetic load hamper their use in hybrid breeding. Production of doubled haploid line libraries (DHL) by the in vivo haploid induction method promises to overcome these problems. To test this hypothesis, we compared the line per se performance of 389 doubled haploid (DH) lines across six DHL produced from European flint landraces with that of four flint founder lines (FFL) and 53 elite flint lines (EFL) for 16 agronomic traits evaluated in four locations. The genotypic variance (
σ
G
2
) within DHL was generally much larger than that among DHL and exceeded also
σ
G
2
of the EFL. For most traits, the means and
σ
G
2
differed considerably among the DHL, resulting in different expected selection gains. Mean grain yield of the EFL was 25 and 62% higher than for the FFL and DHL, respectively, indicating considerable breeding progress in the EFL and a remnant genetic load in the DHL. Usefulness of the best 20% lines was for individual DHL comparable to the EFL and grain yield (GY) in the top lines from both groups was similar. Our results corroborate the tremendous potential of landraces for broadening the narrow genetic base of elite germplasm. To make best use of these “gold reserves”, we propose a multi-stage selection approach with optimal allocation of resources to (1) choose the most promising landraces for DHL production and (2) identify the top DH lines for further breeding.
High-density genotyping is extensively exploited in genome-wide association mapping studies and genomic selection in maize. By contrast, linkage mapping studies were until now mostly based on ...low-density genetic maps and theoretical results suggested this to be sufficient. This raises the question, if an increase in marker density would be an overkill for linkage mapping in biparental populations, or if important QTL mapping parameters would benefit from it. In this study, we addressed this question using experimental data and a simulation based on linkage maps with marker densities of 1, 2, and 5 cM. QTL mapping was performed for six diverse traits in a biparental population with 204 doubled haploid maize lines and in a simulation study with varying QTL effects and closely linked QTL for different population sizes. Our results showed that high-density maps neither improved the QTL detection power nor the predictive power for the proportion of explained genotypic variance. By contrast, the precision of QTL localization, the precision of effect estimates of detected QTL, especially for small and medium sized QTL, as well as the power to resolve closely linked QTL profited from an increase in marker density from 5 to 1 cM. In conclusion, the higher costs for high-density genotyping are compensated for by more precise estimates of parameters relevant for knowledge-based breeding, thus making an increase in marker density for linkage mapping attractive.
From simulation studies it is known that the allocation of experimental resources has a crucial effect on power of QTL detection as well as on accuracy and precision of QTL estimates. In this study, ...we used a very large experimental data set composed of 976 F(5) maize testcross progenies evaluated in 19 environments and cross-validation to assess the effect of sample size (N), number of test environments (E), and significance threshold on the number of detected QTL, the proportion of the genotypic variance explained by them, and the corresponding bias of estimates for grain yield, grain moisture, and plant height. In addition, we used computer simulations to compare the usefulness of two cross-validation schemes for obtaining unbiased estimates of QTL effects. The maximum, validated genotypic variance explained by QTL in this study was 52.3% for grain moisture despite the large number of detected QTL, thus confirming the infinitesimal model of quantitative genetics. In both simulated and experimental data, the effect of sample size on power of QTL detection as well as on accuracy and precision of QTL estimates was large. The number of detected QTL and the proportion of genotypic variance explained by QTL generally increased more with increasing N than with increasing E. The average bias of QTL estimates and its range were reduced by increasing N and E. Cross-validation performed well with respect to yielding asymptotically unbiased estimates of the genotypic variance explained by QTL. On the basis of our findings, recommendations for planning of QTL mapping experiments and allocation of experimental resources are given.
Key message
Complementing genomic data with other “omics” predictors can increase the probability of success for predicting the best hybrid combinations using complex agronomic traits
.
Accurate ...prediction of traits with complex genetic architecture is crucial for selecting superior candidates in animal and plant breeding and for guiding decisions in personalized medicine. Whole-genome prediction has revolutionized these areas but has inherent limitations in incorporating intricate epistatic interactions. Downstream “omics” data are expected to integrate interactions within and between different biological strata and provide the opportunity to improve trait prediction. Yet, predicting traits from parents to progeny has not been addressed by a combination of “omics” data. Here, we evaluate several “omics” predictors—genomic, transcriptomic and metabolic data—measured on parent lines at early developmental stages and demonstrate that the integration of transcriptomic with genomic data leads to higher success rates in the correct prediction of untested hybrid combinations in maize. Despite the high predictive ability of genomic data, transcriptomic data alone outperformed them and other predictors for the most complex heterotic trait, dry matter yield. An eQTL analysis revealed that transcriptomic data integrate genomic information from both, adjacent and distant sites relative to the expressed genes. Together, these findings suggest that downstream predictors capture physiological epistasis that is transmitted from parents to their hybrid offspring. We conclude that the use of downstream “omics” data in prediction can exploit important information beyond structural genomics for leveraging the efficiency of hybrid breeding.
The efficiency of marker-assisted selection (MAS) depends on the power of quantitative trait locus (QTL) detection and unbiased estimation of QTL effects. Two independent samples N = 344 and 107 of ...F2 plants were genotyped for 89 RFLP markers. For each sample, testcross (TC) progenies of the corresponding F3 lines with two testers were evaluated in four environments. QTL for grain yield and other agronomically important traits were mapped in both samples. QTL effects were estimated from the same data as used for detection and mapping of QTL (calibration) and, based on QTL positions from calibration, from the second, independent sample (validation). For all traits and both testers we detected a total of 107 QTL with N = 344, and 39 QTL with N = 107, of which only 20 were in common. Consistency of QTL effects across testers was in agreement with corresponding genotypic correlations between the two TC series. Most QTL displayed no significant QTL x environment nor epistatic interactions. Estimates of the proportion of the phenotypic and genetic variance explained by QTL were considerably reduced when derived from the independent validation sample as opposed to estimates from the calibration sample. We conclude that, unless QTL effects are estimated from an independent sample, they can be inflated, resulting in an overly optimistic assessment of the efficiency of MAS.