The United States Department of Agriculture, Soybean Germplasm Collection includes 18,480 domesticated soybean and 1168 wild soybean accessions introduced from 84 countries or developed in the United ...States. This collection was genotyped with the SoySNP50K BeadChip containing greater than 50K single-nucleotide polymorphisms. Redundant accessions were identified in the collection, and distinct genetic backgrounds of soybean from different geographic origins were observed that could be a unique resource for soybean genetic improvement. We detected a dramatic reduction of genetic diversity based on linkage disequilibrium and haplotype structure analyses of the wild, landrace, and North American cultivar populations and identified candidate regions associated with domestication and selection imposed by North American breeding. We constructed the first soybean haplotype block maps in the wild, landrace, and North American cultivar populations and observed that most recombination events occurred in the regions between haplotype blocks. These haplotype maps are crucial for association mapping aimed at the identification of genes controlling traits of economic importance. A case-control association test delimited potential genomic regions along seven chromosomes that most likely contain genes controlling seed weight in domesticated soybean. The resulting dataset will facilitate germplasm utilization, identification of genes controlling important traits, and will accelerate the creation of soybean varieties with improved seed yield and quality.
Key message
Independent soybean breeding programs shape genetic diversity from unimproved germplasm to modern cultivars in similar ways, but distinct breeding populations retain unique genetic ...variation, preserving additional diversity.
From the domestication of wild soybean (
Glycine soja
Sieb. & Zucc.), over 3,000 years ago, to the modern soybean (
Glycine max
L. Merr) cultivars that provide much of the world’s oil and protein, soybean populations have undergone fundamental changes. We evaluated the molecular impact of breeding and selection using 391 soybean accessions including US cultivars and their progenitors from the USDA Soybean Germplasm Collection (CGP), plus two new populations specifically developed to increase genetic diversity and high yield in two alternative gene pools: one derived from exotic
G. max
germplasm (AGP) and one derived from
G. soja
(SGP). Reduction in nucleotide genetic diversity (
π
) was observed with selection within gene pools, but artificial selection in the AGP maintained more diversity than in the CGP. The highest F
ST
levels were seen between ancestral and elite lines in all gene pools, but specific nucleotide-level patterns varied between gene pools. Population structure analyses support that independent selection resulted in high-yielding elite lines with similar allelic compositions in the AGP and CGP. SGP, however, produced elite progeny that were well differentiated from, but lower yielding than, CGP elites. Both the AGP and SGP retained a significant number of private alleles that are absent in CGP. We conclude that the genomic diversity shaped by multiple selective breeding programs can result in gene pools of highly productive elite lines with similar allelic compositions in a genome-wide perspective. Breeding programs with different ancestral lines, however, can retain private alleles representing unique genetic diversity.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
With advances in next-generation sequencing technologies, an unprecedented amount of soybean accessions has been sequenced by many individual studies and made available as raw sequencing reads for ...post-genomic research.
To develop a consolidated and user-friendly genomic resource for post-genomic research, we consolidated the raw resequencing data of 1465 soybean genomes available in the public and 91 highly diverse wild soybean genomes newly sequenced. These altogether provided a collection of 1556 sequenced genomes of 1501 diverse accessions (1.5 K). The collection comprises of wild, landraces and elite cultivars of soybean that were grown in East Asia or major soybean cultivating areas around the world. Our extensive sequence analysis discovered 32 million single nucleotide polymorphisms (32mSNPs) and revealed a SNP density of 30 SNPs/kb and 12 non-synonymous SNPs/gene reflecting a high structural and functional genomic diversity of the new collection. Each SNP was annotated with 30 categories of structural and/or functional information. We further identified paired accessions between the 1.5 K and 20,087 (20 K) accessions in US collection as genomic "equivalent" accessions sharing the highest genomic identity for minimizing the barriers in soybean germplasm exchange between countries. We also exemplified the utility of 32mSNPs in enhancing post-genomics research through in-silico genotyping, high-resolution GWAS, discovering and/or characterizing genes and alleles/mutations, identifying germplasms containing beneficial alleles that are potentially experiencing artificial selection.
The comprehensive analysis of publicly available large-scale genome sequencing data of diverse cultivated accessions and the newly in-house sequenced wild accessions greatly increased the soybean genome-wide variation resolution. This could facilitate a variety of genetic and molecular-level analyses in soybean. The 32mSNPs and 1.5 K accessions with their comprehensive annotation have been made available at the SoyBase and Ag Data Commons. The dataset could further serve as a versatile and expandable core resource for exploring the exponentially increasing genome sequencing data for a variety of post-genomic research.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genomic selection has been utilized for genetic improvement in both plant and animal breeding and is a favorable technique for quantitative trait development. Within this study, genomic selection was ...evaluated within a breeding program, using novel validation methods in addition to plant materials and data from a commercial soybean (Glycine max) breeding program. A total of 1501 inbred lines were used to test multiple genomic selection models for multiple traits. Validation included cross‐validation, inter‐environment, and empirical validation. The results indicated that the extended genomic best linear unbiased prediction (EGBLUP) model was the most effective model tested for yield, protein, and oil in cross‐validation with accuracies of 0.50, 0.68, and 0.64, respectively. Increasing marker number from 1000 to 3000 to 6000 single nucleotide polymorphism markers leads to statistically significant increases in accuracy. Cross‐environment predictions were statistically lower than cross‐validation with accuracies of 0.24, 0.54, and 0.42 for yield, protein, and oil, respectively, using the extended genomic BLUP model. Empirical validation, predicting the yield of 510 soybean lines, had a prediction accuracy of 0.34, with the inclusion of a maturity covariate leading to a notable increase in accuracy. Genomic selection identified high‐performance lines in inter‐environment predictions: 34% of lines within the upper quartile of yield, and 51% and 48% of the highest quartile protein and oil lines, respectively. Statistically similar results occurred comparing rankings in empirical validation and selection for advancements in yield trials. These results indicate that genomic selection is a useful tool for selection decisions.
Core Ideas
Genetic improvement of soybean yield is an ultimate goal of soybean breeding.
Cross and inter‐environment validation methods of genomic selection using elite breeding materials lead to statistically different predictive accuracies.
Soybean genomic selection can assist breeders in evaluating breeding materials for yield and seed composition.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Simple sequence repeat (SSR) genetic markers, also referred to as microsatellites, function in map-based cloning and for marker-assisted selection in plant breeding. The objectives of this study were ...to determine the abundance of SSRs in the soybean genome and to develop and test soybean SSR markers to create a database of locus-specific markers with a high likelihood of polymorphism. A total of 210,990 SSRs with di-, tri-, and tetranucleotide repeats of five or more were identified in the soybean whole genome sequence (WGS) which included 61,458 SSRs consisting of repeat units of di- (≥10), tri- (≥8), and tetranucleotide (≥7). Among the 61,458 SSRs, (AT)n, (ATT)n and (AAAT)n were the most abundant motifs among di-, tri-, and tetranucleotide SSRs, respectively. After screening for a number of factors including locus-specificity using e-PCR, a soybean SSR database (BARCSOYSSR_1.0) with the genome position and primer sequences for 33,065 SSRs was created. To examine the likelihood that primers in the database would function to amplify locus-specific polymorphic products, 1034 primer sets were evaluated by amplifying DNAs of seven diverse Glycine max (L.) Merr. and one wild soybean (Glycine soja Siebold & Zucc.) genotypes. A total of 978 (94.6%) of the primer sets amplified a single polymerase chain reaction (PCR) product and 798 (77.2%) amplified polymorphic amplicons as determined by 4.5% agarose gel electrophoresis. The BARCSOYSSR1.0 SSR markers can be found in SoyBase (http://soybase.org; verified 21 June 2010) the USDA-ARS Soybean Genome Database.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
The nuclear fertility restorer gene Rf5 in HA-R9, originating from the wild sunflower species Helianthus annuus, is able to restore the widely used PET1 cytoplasmic male sterility in sunflowers. ...Previous mapping placed Rf5 at an interval of 5.8 cM on sunflower chromosome 13, distal to a rust resistance gene R
at a 1.6 cM genetic distance in an SSR map. In the present study, publicly available SNP markers were further mapped around Rf5 and R
using 192 F
individuals, reducing the Rf5 interval from 5.8 to 0.8 cM. Additional SNP markers were developed in the target region of the two genes from the whole-genome resequencing of HA-R9, a donor line carrying Rf5 and R
. Fine mapping using 3517 F
individuals placed Rf5 at a 0.00071 cM interval and the gene co-segregated with SNP marker S13_216392091. Similarly, fine mapping performed using 8795 F
individuals mapped R
at an interval of 0.00210 cM, co-segregating with two SNP markers, S13_225290789 and C13_181790141. Sequence analysis identified Rf5 as a pentatricopeptide repeat-encoding gene. The high-density map and diagnostic SNP markers developed in this study will accelerate the use of Rf5 and R
in sunflower breeding.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The nuclear fertility restorer gene Rf5 in HA-R9, originating from the wild sunflower species Helianthus annuus, is able to restore the widely used PET1 cytoplasmic male sterility in sunflowers. ...Previous mapping placed Rf5 at an interval of 5.8 cM on sunflower chromosome 13, distal to a rust resistance gene R11 at a 1.6 cM genetic distance in an SSR map. In the present study, publicly available SNP markers were further mapped around Rf5 and R11 using 192 F2 individuals, reducing the Rf5 interval from 5.8 to 0.8 cM. Additional SNP markers were developed in the target region of the two genes from the whole-genome resequencing of HA-R9, a donor line carrying Rf5 and R11. Fine mapping using 3517 F3 individuals placed Rf5 at a 0.00071 cM interval and the gene co-segregated with SNP marker S13_216392091. Similarly, fine mapping performed using 8795 F3 individuals mapped R11 at an interval of 0.00210 cM, co-segregating with two SNP markers, S13_225290789 and C13_181790141. Sequence analysis identified Rf5 as a pentatricopeptide repeat-encoding gene. The high-density map and diagnostic SNP markers developed in this study will accelerate the use of Rf5 and R11 in sunflower breeding.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Genomic selection (GS) has become viable for selection of quantitative traits for which marker-assisted selection has often proven less effective. The potential of GS for soybean was characterized ...using 483 elite breeding lines, genotyped with BARCSoySNP6K iSelect BeadChips. Cross validation was performed using RR-BLUP and predictive abilities (
) of 0.81, 0.71, and 0.26 for protein, oil, and yield, were achieved at the largest tested training set size. Minimal differences were observed when comparing different marker densities and there appeared to be inflation in
due to population structure. For comparison purposes, two additional methods to predict breeding values for lines of four bi-parental populations within the GS dataset were tested. The first method predicted within each bi-parental population (WP method) and utilized a training set of full-sibs of the validation set. The second method utilized a training set of all remaining breeding lines except for full-sibs of the validation set to predict across populations (AP method). The AP method is more practical as the WP method would likely delay the breeding cycle and leverage smaller training sets. Averaging across populations for protein and oil content,
for the AP method (0.55, 0.30) approached
for the WP method (0.60, 0.52). Though comparable,
for yield was low for both AP and WP methods (0.12, 0.13). Based on increases in
as training sets increased and the effectiveness of WP
AP method, the AP method could potentially improve with larger training sets and increased relatedness between training and validation sets.
Improving yield is a primary soybean breeding goal, as yield is the main determinant of soybean's profitability. Within the breeding process, selection of cross combinations is one of most important ...elements. Cross prediction will assist soybean breeders in identifying the best cross combinations among parental genotypes prior to crossing, increasing genetic gain and breeding efficiency. In this study optimal cross selection methods were created and applied in soybean and validated using historical data from the University of Georgia soybean breeding program, under multiple training set compositions and marker densities utilizing multiple genomic selection models for marker evaluation. Plant materials consisted of 702 advanced breeding lines evaluated in multiple environments and genotyped using SoySNP6k BeadChips. An additional marker set, the SoySNP3k marker set, was tested in this study as well. Optimal cross selection methods were used to predict the yield of 42 previously made crosses and compared to the performance of the cross's offspring in replicated field trials. The best prediction accuracy was obtained when using Extended Genomic BLUP with the SoySNP6k marker set, consisting of 3,762 polymorphic markers, with an accuracy of 0.56 with a training set maximally related to the crosses predicted and 0.4 in a training set with minimized relatedness to predicted crosses. Prediction accuracy was most significantly impacted by training set relatedness to the predicted crosses, marker density, and the genomic model used to predict marker effects. The usefulness criterion selected had an impact on prediction accuracy within training sets with low relatedness to the crosses predicted. Optimal cross prediction provides a useful method that assists plant breeders in selecting crosses in soybean breeding.