Key message
R/StageWise enables fully efficient, two-stage analysis of multi-environment, multi-trait datasets for genomic selection, including support for dominance heterosis and polyploidy.
Plant ...breeders interested in genomic selection often face challenges to fully utilizing multi-trait, multi-environment datasets. R package StageWise was developed to go beyond the capabilities of most specialized software for genomic prediction, without requiring the programming skills needed for more general-purpose software for mixed models. As the name suggests, one of the core features is a fully efficient, two-stage analysis for multiple environments, in which the full variance–covariance matrix of the Stage 1 genotype means is used in Stage 2. Another feature is directional dominance, including for polyploids, to account for inbreeding depression in outbred crops. StageWise enables selection with multi-trait indices, including restricted indices with one or more traits constrained to have zero response. For a potato dataset with 943 genotypes evaluated over 6 years, including the Stage 1 errors in Stage 2 reduced the Akaike Information Criterion (AIC) by 29, 67, and 104 for maturity, yield, and fry color, respectively. The proportion of variation explained by heterosis was largest for yield but still only 0.03, likely because of limited variation for the genomic inbreeding coefficient. Due to the large additive genetic correlation (0.57) between yield and maturity, naïve selection on an index combining yield and fry color led to an undesirable response for later maturity. The restricted index coefficients to maximize genetic merit without delaying maturity were identified. The software and three vignettes are available at
https://github.com/jendelman/StageWise
.
Consensus genetic maps constructed from multiple populations are an important resource for both basic and applied research, including genome-wide association analysis, genome sequence assembly and ...studies of evolution. The LPmerge software uses linear programming to efficiently minimize the mean absolute error between the consensus map and the linkage maps from each population. This minimization is performed subject to linear inequality constraints that ensure the ordering of the markers in the linkage maps is preserved. When marker order is inconsistent between linkage maps, a minimum set of ordinal constraints is deleted to resolve the conflicts.
LPmerge is on CRAN at http://cran.r-project.org/web/packages/LPmerge.
Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marker‐assisted selection. Genomic selection addresses this complexity by including all markers in the ...prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive kernels in plant breeding, a new software package for R called rrBLUP has been developed. At its core is a fast maximum‐likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for efficient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identification of optimal crosses based on superior progeny value. In cross‐validation tests, the prediction accuracy with nonadditive kernels was significantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits.
New sources of genetic diversity must be incorporated into plant breeding programs if they are to continue increasing grain yield and quality, and tolerance to abiotic and biotic stresses. Germplasm ...collections provide a source of genetic and phenotypic diversity, but characterization of these resources is required to increase their utility for breeding programs. We used a barley SNP iSelect platform with 7,842 SNPs to genotype 2,417 barley accessions sampled from the USDA National Small Grains Collection of 33,176 accessions. Most of the accessions in this core collection are categorized as landraces or cultivars/breeding lines and were obtained from more than 100 countries. Both STRUCTURE and principal component analysis identified five major subpopulations within the core collection, mainly differentiated by geographical origin and spike row number (an inflorescence architecture trait). Different patterns of linkage disequilibrium (LD) were found across the barley genome and many regions of high LD contained traits involved in domestication and breeding selection. The genotype data were used to define 'mini-core' sets of accessions capturing the majority of the allelic diversity present in the core collection. These 'mini-core' sets can be used for evaluating traits that are difficult or expensive to score. Genome-wide association studies (GWAS) of 'hull cover', 'spike row number', and 'heading date' demonstrate the utility of the core collection for locating genetic factors determining important phenotypes. The GWAS results were referenced to a new barley consensus map containing 5,665 SNPs. Our results demonstrate that GWAS and high-density SNP genotyping are effective tools for plant breeders interested in accessing genetic diversity in large germplasm collections.
The additive relationship matrix plays an important role in mixed model prediction of breeding values. For genotype matrix X (loci in columns), the product XX' is widely used as a realized ...relationship matrix, but the scaling of this matrix is ambiguous. Our first objective was to derive a proper scaling such that the mean diagonal element equals 1+f, where f is the inbreeding coefficient of the current population. The result is a formula involving the covariance matrix for sampling genomic loci, which must be estimated with markers. Our second objective was to investigate whether shrinkage estimation of this covariance matrix can improve the accuracy of breeding value (GEBV) predictions with low-density markers. Using an analytical formula for shrinkage intensity that is optimal with respect to mean-squared error, simulations revealed that shrinkage can significantly increase GEBV accuracy in unstructured populations, but only for phenotyped lines; there was no benefit for unphenotyped lines. The accuracy gain from shrinkage increased with heritability, but at high heritability (> 0.6) this benefit was irrelevant because phenotypic accuracy was comparable. These trends were confirmed in a commercial pig population with progeny-test-estimated breeding values. For an anonymous trait where phenotypic accuracy was 0.58, shrinkage increased the average GEBV accuracy from 0.56 to 0.62 (SE < 0.00) when using random sets of 384 markers from a 60K array. We conclude that when moderate-accuracy phenotypes and low-density markers are available for the candidates of genomic selection, shrinkage estimation of the relationship matrix can improve genetic gain.
Potato virus Y
is the most important potato virus worldwide, affecting tuber yield and quality. The resistance gene
Ry
chc
, derived from the potato wild relative
Solanum chacoense
, provides broad ...spectrum and durable resistance to the virus and has been used to develop resistant cultivars. Several DNA markers have been developed and have contributed to the efficient selection of resistant individuals. In this study, we developed Kompetitive Allele Specific PCR markers for
Ry
chc
using whole-genome resequencing data for a diverse set of 25 PVY susceptible cultivars and a
Ry
chc
-positive clone. Marker Ry_4099 targets two variants in the 3ʹ-UTR and was able to discriminate all five allele dosages in a tetraploid test population. Marker Ry_3331 targets two variants in Exon 4 and, although it only provides presence/absence information, it discriminates between the two known resistant alleles of
Ry
chc
. These markers will greatly contribute to efficient development of resistant cultivars.
Key message
This is the first report of the production and use of a diploid inbred line-based F2 population for genetic mapping in potato.
Potato (
Solanum tuberosum
L.) is an important global food ...crop, for which tetrasomic inheritance and self-incompatibility have limited both genetic discovery and breeding gains. We report here on the creation of the first diploid inbred line-derived F2 population in potato, and demonstrate its utility for genetic mapping. To create the population, the doubled monoploid potato DM1-3 was crossed as a female to M6, an S
7
inbred line derived from the wild relative
S. chacoense
, and a single F1 plant was then self-pollinated. A genetic linkage map with 2264 single nucleotide polymorphisms was constructed and used to improve the physical anchoring of superscaffolds in the potato reference genome, which is based on DM1-3. Segregation was observed for skin and flesh color, skin and flesh pigment intensity, tuber shape, anther development, jelly end, and the presence of eye tubers instead of normal sprouts. Using the R/qtl software, we detected 10 genes, 7 of which have been previously mapped and 3 for which this is the first publication. The latter category includes tightly linked genes for the jelly end and eye tuber traits on chromosome 5. The development of recombinant inbred lines from this F2 population by single-seed descent is underway and should facilitate even better resolution of these and other loci.
In diploid species, many multiparental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype ...reconstruction has been used as a standard practice to increase the power of QTL detection in comparison with the marker-based association analysis. However, such software tools for polyploid species are few and limited to a single biparental F1 population. In this study, a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents, regardless of the number of parents or mating design. Given a genetic or physical map of markers, PolyOrigin first phases parental genotypes, then refines the input marker map, and finally reconstructs offspring haplotypes. PolyOrigin can utilize single nucleotide polymorphism (SNP) data coming from arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype calling errors at low depth. With extensive simulation we show that PolyOrigin is robust to the errors in the input genotypic data and marker map. It works well for various population designs with ≥30 offspring per parent and for sequences with read depth as low as 10x. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3 × 3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.
At present, the potato (Solanum tuberosum L.) of international commerce is autotetraploid, and the complexity of this genetic system creates limitations for breeding. Diploid potato breeding has long ...been used for population improvement, and because of an improved understanding of the genetics of gametophytic self‐incompatibility, there is now sustained interest in the development of uniform F1 hybrid varieties based on inbred parents. We report here on the use of haplotype and quantitative trait locus (QTL) analysis in a modified backcrossing (BC) scheme, using primary dihaploids of S. tuberosum as the recurrent parental background. In Cycle 1, we selected XD3‐36, a self‐fertile F2 individual homozygous for the self‐compatibility gene Sli (S‐locus inhibitor). Signatures of gametic and zygotic selection were observed at multiple loci in the F2 generation, including Sli. In the BC1 cycle, an F1 population derived from XD3‐36 showed a bimodal response for vine maturity, which led to the identification of late versus early alleles in XD3‐36 for the gene CDF1 (Cycling DOF Factor 1). Greenhouse phenotypes and haplotype analysis were used to select a vigorous and self‐fertile F2 individual with 43% homozygosity, including for Sli and the early‐maturing allele CDF1.3. Partially inbred lines from the BC1 and BC2 cycles have been used to initiate new cycles of selection, with the goal of reaching higher homozygosity while maintaining plant vigor, fertility, and yield.
Core Ideas
Partially inbred, diploid potato lines were developed for transitioning to an inbred‐hybrid breeding system.
Multi‐generational linkage analysis was used to track and fix favorable alleles without haplotype‐specific markers.
Signatures of gametic and zygotic selection were detected by maximum likelihood.
Genome‐wide association studies (GWAS) are widely used in diploid species to study complex traits in diversity and breeding populations, but GWAS software tailored to autopolyploids is lacking. The ...objectives of this research were to (i) develop an R package for autopolyploids based on the Q + K mixed model, (ii) validate the software with simulated data, and (iii) analyze a diversity panel of tetraploid potatoes. A unique feature of the R package, called GWASpoly, is its ability to model different types of polyploid gene action, including additive, simplex dominant, and duplex dominant. Using a simulated tetraploid population, we confirmed our hypothesis that statistical power is higher when the assumed gene action in the GWAS model matches the gene action at unobserved quantitative trait loci (QTL). Thirteen traits were analyzed in the Solanaceae Coordinated Agricultural Project (SolCAP) potato diversity panel and, consistent with previous studies, significant QTL for tuber shape and eye depth co‐localized on chromosome 10. For the other traits, only marginally significant QTL were detected, most likely due to insufficient statistical power: for simulated traits with a heritability (h2) of 0.3, the median genome‐wide power was only 0.01. Our results indicate that both marker density and population size were limiting factors for GWAS with the SolCAP panel.