The objective of this research was to identify single nucleotide polymorphisms (SNPs) and to develop an Illumina Infinium BeadChip that contained over 50,000 SNPs from soybean (Glycine max L. Merr.). ...A total of 498,921,777 reads 35-45 bp in length were obtained from DNA sequence analysis of reduced representation libraries from several soybean accessions which included six cultivated and two wild soybean (G. soja Sieb. et Zucc.) genotypes. These reads were mapped to the soybean whole genome sequence and 209,903 SNPs were identified. After applying several filters, a total of 146,161 of the 209,903 SNPs were determined to be ideal candidates for Illumina Infinium II BeadChip design. To equalize the distance between selected SNPs, increase assay success rate, and minimize the number of SNPs with low minor allele frequency, an iteration algorithm based on a selection index was developed and used to select 60,800 SNPs for Infinium BeadChip design. Of the 60,800 SNPs, 50,701 were targeted to euchromatic regions and 10,000 to heterochromatic regions of the 20 soybean chromosomes. In addition, 99 SNPs were targeted to unanchored sequence scaffolds. Of the 60,800 SNPs, a total of 52,041 passed Illumina's manufacturing phase to produce the SoySNP50K iSelect BeadChip. Validation of the SoySNP50K chip with 96 landrace genotypes, 96 elite cultivars and 96 wild soybean accessions showed that 47,337 SNPs were polymorphic and generated successful SNP allele calls. In addition, 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥ 10% among the landraces, elite cultivars and the wild soybean accessions. A total of 620 and 42 candidate regions which may be associated with domestication and recent selection were identified, respectively. The SoySNP50K iSelect SNP beadchip will be a powerful tool for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly.
Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of ...defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content.
A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil.
This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).
The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only ...sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds.
A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%.
We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8x whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.
KEY MESSAGE : Twenty-two loci for soybean SW and candidate genes conditioning seed development were identified; and prediction accuracies of GS and MAS were estimated through cross-validation and ...validation with unrelated populations. Soybean (Glycine max) is a major crop for plant protein and oil production, and seed weight (SW) is important for yield and quality in food/vegetable uses of soybean. However, our knowledge of genes controlling SW remains limited. To better understand the molecular mechanism underlying the trait and explore marker-based breeding approaches, we conducted a genome-wide association study in a population of 309 soybean germplasm accessions using 31,045 single nucleotide polymorphisms (SNPs), and estimated the prediction accuracy of genomic selection (GS) and marker-assisted selection (MAS) for SW. Twenty-two loci of minor effect associated with SW were identified, including hotspots on Gm04 and Gm19. The mixed model containing these loci explained 83.4 % of phenotypic variation. Candidate genes with Arabidopsis orthologs conditioning SW were also proposed. The prediction accuracies of GS and MAS by cross-validation were 0.75–0.87 and 0.62–0.75, respectively, depending on the number of SNPs used and the size of training population. GS also outperformed MAS when the validation was performed using unrelated panels across a wide range of maturities, with an average prediction accuracy of 0.74 versus 0.53. This study convincingly demonstrated that soybean SW is controlled by numerous minor-effect loci. It greatly enhances our understanding of the genetic basis of SW in soybean and facilitates the identification of genes controlling the trait. It also suggests that GS holds promise for accelerating soybean breeding progress. The results are helpful for genetic improvement and genomic prediction of yield in soybean.
Soybean (Glycine max) is a photoperiod-sensitive and self-pollinated species. Days to flowering (DTF) and maturity (DTM), duration of flowering-to-maturity (DFTM) and plant height (PH) are crucial ...for soybean adaptability and yield. To dissect the genetic architecture of these agronomically important traits, a population consisting of 309 early maturity soybean germplasm accessions was genotyped with the Illumina Infinium SoySNP50K BeadChip and phenotyped in multiple environments. A genome-wide association study (GWAS) was conducted using a mixed linear model that involves both relative kinship and population structure.
The linkage disequilibrium (LD) decayed slowly in soybean, and a substantial difference in LD pattern was observed between euchromatic and heterochromatic regions. A total of 27, 6, 18 and 27 loci for DTF, DTM, DFTM and PH were detected via GWAS, respectively. The Dt1 gene was identified in the locus strongly associated with both DTM and PH. Ten candidate genes homologous to Arabidopsis flowering genes were identified near the peak single nucleotide polymorphisms (SNPs) associated with DTF. Four of them encode MADS-domain containing proteins. Additionally, a pectin lyase-like gene was also identified in a major-effect locus for PH where LD decayed rapidly.
This study identified multiple new loci and refined chromosomal regions of known loci associated with DTF, DTM, DFTM and/or PH in soybean. It demonstrates that GWAS is powerful in dissecting complex traits and identifying candidate genes although LD decayed slowly in soybean. The loci and trait-associated SNPs identified in this study can be used for soybean genetic improvement, especially the major-effect loci associated with PH could be used to improve soybean yield potential. The candidate genes may serve as promising targets for studies of molecular mechanisms underlying the related traits in soybean.
A landmark in soybean research, Glyma1.01, the first whole genome sequence of variety Williams 82 (Glycine max L. Merr.) was completed in 2010 and is widely used. However, because the assembly was ...primarily built based on the linkage maps constructed with a limited number of markers and recombinant inbred lines (RILs), the assembled sequence, especially in some genomic regions with sparse numbers of anchoring markers, needs to be improved. Molecular markers are being used by researchers in the soybean community, however, with the updating of the Glyma1.01 build based on the high-resolution linkage maps resulting from this research, the genome positions of these markers need to be mapped.
Two high density genetic linkage maps were constructed based on 21,478 single nucleotide polymorphism loci mapped in the Williams 82 x G. soja (Sieb. & Zucc.) PI479752 population with 1083 RILs and 11,922 loci mapped in the Essex x Williams 82 population with 922 RILs. There were 37 regions or single markers where marker order in the two populations was in agreement but was not consistent with the physical position in the Glyma1.01 build. In addition, 28 previously unanchored scaffolds were positioned. Map data were used to identify false joins in the Glyma1.01 assembly and the corresponding scaffolds were broken and reassembled to the new assembly, Wm82.a2.v1. Based upon the plots of the genetic on physical distance of the loci, the euchromatic and heterochromatic regions along each chromosome in the new assembly were delimited. Genomic positions of the commonly used markers contained in BARCSOYSSR_1.0 database and the SoySNP50K BeadChip were updated based upon the Wm82.a2.v1 assembly.
The information will facilitate the study of recombination hot spots in the soybean genome, identification of genes or quantitative trait loci controlling yield, seed quality and resistance to biotic or abiotic stresses as well as other genetic or genomic research.
Sudden death syndrome (SDS) is a serious threat to soybean production that can be managed with host plant resistance. To dissect the genetic architecture of quantitative resistance to the disease in ...soybean, two independent association panels of elite soybean cultivars, consisting of 392 and 300 unique accessions, respectively, were evaluated for SDS resistance in multiple environments and years. The two association panels were genotyped with 52,041 and 5,361 single nucleotide polymorphisms (SNPs), respectively. Genome-wide association mapping was carried out using a mixed linear model that accounted for population structure and cryptic relatedness.
A total of 20 loci underlying SDS resistance were identified in the two independent studies, including 7 loci localized in previously mapped QTL intervals and 13 novel loci. One strong peak of association on chromosome 18, associated with all disease assessment criteria across the two panels, spanned a physical region of 1.2 Mb around a previously cloned SDS resistance gene (GmRLK18-1) in locus Rfs2. An additional variant independently associated with SDS resistance was also found in this genomic region. Other peaks were within, or close to, sequences annotated as homologous to genes previously shown to be involved in plant disease resistance. The identified loci explained an average of 54.5% of the phenotypic variance measured by different disease assessment criteria.
This study identified multiple novel loci and refined the map locations of known loci related to SDS resistance. These insights into the genetic basis of SDS resistance can now be used to further enhance durable resistance to SDS in soybean. Additionally, the associations identified here provide a basis for further efforts to pinpoint causal variants and to clarify how the implicated genes affect SDS resistance in soybean.
The United States Department of Agriculture, Soybean Germplasm Collection includes 18,480 domesticated soybean and 1168 wild soybean accessions introduced from 84 countries or developed in the United ...States. This collection was genotyped with the SoySNP50K BeadChip containing greater than 50K single-nucleotide polymorphisms. Redundant accessions were identified in the collection, and distinct genetic backgrounds of soybean from different geographic origins were observed that could be a unique resource for soybean genetic improvement. We detected a dramatic reduction of genetic diversity based on linkage disequilibrium and haplotype structure analyses of the wild, landrace, and North American cultivar populations and identified candidate regions associated with domestication and selection imposed by North American breeding. We constructed the first soybean haplotype block maps in the wild, landrace, and North American cultivar populations and observed that most recombination events occurred in the regions between haplotype blocks. These haplotype maps are crucial for association mapping aimed at the identification of genes controlling traits of economic importance. A case-control association test delimited potential genomic regions along seven chromosomes that most likely contain genes controlling seed weight in domesticated soybean. The resulting dataset will facilitate germplasm utilization, identification of genes controlling important traits, and will accelerate the creation of soybean varieties with improved seed yield and quality.
Next generation sequencing has significantly increased the speed at which single nucleotide polymorphisms (SNPs) can be discovered and subsequently used as molecular markers for research. ...Unfortunately, for species such as common bean (Phaseolus vulgaris L.) which do not have a whole genome sequence available, the use of next generation sequencing for SNP discovery is much more difficult and costly. To this end we developed a method which couples sequences obtained from the Roche 454-FLX system (454) with the Illumina Genome Analyzer (GA) for high-throughput SNP discovery.
Using a multi-tier reduced representation library we discovered a total of 3,487 SNPs of which 2,795 contained sufficient flanking genomic sequence for SNP assay development. Using Sanger sequencing to determine the validation rate of these SNPs, we found that 86% are likely to be true SNPs. Furthermore, we designed a GoldenGate assay which contained 1,050 of the 3,487 predicted SNPs. A total of 827 of the 1,050 SNPs produced a working GoldenGate assay (79%).
Through combining two next generation sequencing techniques we have developed a method that allows high-throughput SNP discovery in any diploid organism without the need of a whole genome sequence or the creation of normalized cDNA libraries. The need to only perform one 454 run and one GA sequencer run allows high-throughput SNP discovery with sufficient sequence for assay development to be performed in organisms, such as common bean, which have limited genomic resources.
Simple sequence repeat (SSR) genetic markers, also referred to as microsatellites, function in map-based cloning and for marker-assisted selection in plant breeding. The objectives of this study were ...to determine the abundance of SSRs in the soybean genome and to develop and test soybean SSR markers to create a database of locus-specific markers with a high likelihood of polymorphism. A total of 210,990 SSRs with di-, tri-, and tetranucleotide repeats of five or more were identified in the soybean whole genome sequence (WGS) which included 61,458 SSRs consisting of repeat units of di- (≥10), tri- (≥8), and tetranucleotide (≥7). Among the 61,458 SSRs, (AT)n, (ATT)n and (AAAT)n were the most abundant motifs among di-, tri-, and tetranucleotide SSRs, respectively. After screening for a number of factors including locus-specificity using e-PCR, a soybean SSR database (BARCSOYSSR_1.0) with the genome position and primer sequences for 33,065 SSRs was created. To examine the likelihood that primers in the database would function to amplify locus-specific polymorphic products, 1034 primer sets were evaluated by amplifying DNAs of seven diverse Glycine max (L.) Merr. and one wild soybean (Glycine soja Siebold & Zucc.) genotypes. A total of 978 (94.6%) of the primer sets amplified a single polymerase chain reaction (PCR) product and 798 (77.2%) amplified polymorphic amplicons as determined by 4.5% agarose gel electrophoresis. The BARCSOYSSR1.0 SSR markers can be found in SoyBase (http://soybase.org; verified 21 June 2010) the USDA-ARS Soybean Genome Database.