Summary
In the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practice. New methods allowing rapid, high‐throughput genotyping of entire crop populations ...have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping‐by‐sequencing (GBS) methods include over a dozen reduced‐representation sequencing (RRS) approaches and at least four whole‐genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best‐suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genomewide association studies (GWAS), marker‐assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small‐ to moderate‐sized genomes and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost‐effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost‐effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease.
1 Figure. (a) Predicted distribution of fragments (100–800 bp; 100 bp bin size) derived from in silico digestion with seven different restriction enzymes or combinations thereof. (b) Quality ...assessment of a 96‐plex HD‐GBS library using a Bioanalyzer. (c) Distribution of PE reads per sample after demultiplexing. (d) Distribution of variants as a function of their depth of coverage following HD‐GBS. (e) Number of variants, (f) proportion of missing data and (g) cost per sample for six different genotyping platforms in soybean. After size selection and PCR amplification, the quality of the GBS library was assessed and the resulting profile (Figure 1b) indicated that the vast majority of size‐selected fragments (including sequencing adapters) ranged between 200 and 600 bp. ...the other Skim‐Seq datasets (@0.5x and 0.2x), as expected, yielded accuracies falling between the two former categories. ...our results demonstrate that HD‐GBS provides an extremely low‐cost method for obtaining an ultra‐dense panel of markers enabling high‐quality imputation of untyped variants from a reference panel.
Whole‐genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic ...inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping‐by‐sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: (i) the expected frequency of heterozygotes exceeds that for singleton loci, and (ii) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole‐genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely rediploidized following an ancient whole‐genome duplication. Importantly, this approach only requires the genotype and allele‐specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.
Accurate and efficient microsatellite loci genotyping is an essential process in population genetics that is also used in various demographic analyses. Protocols for next‐generation sequencing of ...microsatellite loci enable high‐throughput and cross‐compatible allele scoring, common issues that are not addressed by conventional capillary‐based approaches. To improve this process, we have developed an all‐in‐one software, called Seq2Sat (sequence to microsatellite), in C++ to support automated microsatellite genotyping. It directly takes raw reads of microsatellite amplicons and conducts read quality control before inferring genotypes based on depth‐of‐read, read ratio, sequence composition and length. We have also developed a module for sex identification based on sex chromosome–specific locus amplicons. To allow for greater user access and complement autoscoring, we developed SatAnalyzer (microsatellite analyzer), a user‐friendly web‐based platform that conducts reads‐to‐report analyses by calling Seq2Sat for genotype autoscoring and produces interactive genotype graphs for manual editing. SatAnalyzer also allows users to troubleshoot multiplex optimization by analysing read quality and distribution across loci and samples in support of high‐quality library preparation. To evaluate its performance, we benchmarked our toolkit Seq2Sat/SatAnalyzer against a conventional capillary gel method and existing microsatellite genotyping software, MEGASAT, using two datasets. Results showed that SatAnalyzer can achieve >99.70% genotyping accuracy and Seq2Sat is ~5 times faster than MEGASAT despite many more informative tables and figures being generated. Seq2Sat and SatAnalyzer are freely available on github (https://github.com/ecogenomicscanada/Seq2Sat) and dockerhub (https://hub.docker.com/r/rocpengliu/satanalyzer).