Summary
Until recently, achieving a reference‐quality genome sequence for bread wheat was long thought beyond the limits of genome sequencing and assembly technology, primarily due to the large ...genome size and > 80% repetitive sequence content. The release of the chromosome scale 14.5‐Gb IWGSC RefSeq v1.0 genome sequence of bread wheat cv. Chinese Spring (CS) was, therefore, a milestone. Here, we used a direct label and stain (DLS) optical map of the CS genome together with a prior nick, label, repair and stain (NLRS) optical map, and sequence contigs assembled with Pacific Biosciences long reads, to refine the v1.0 assembly. Inconsistencies between the sequence and maps were reconciled and gaps were closed. Gap filling and anchoring of 279 unplaced scaffolds increased the total length of pseudomolecules by 168 Mb (excluding Ns). Positions and orientations were corrected for 233 and 354 scaffolds, respectively, representing 10% of the genome sequence. The accuracy of the remaining 90% of the assembly was validated. As a result of the increased contiguity, the numbers of transposable elements (TEs) and intact TEs have increased in IWGSC RefSeq v2.1 compared with v1.0. In total, 98% of the gene models identified in v1.0 were mapped onto this new assembly through development of a dedicated approach implemented in the MAGAAT pipeline. The numbers of high‐confidence genes on pseudomolecules have increased from 105 319 to 105 534. The reconciled assembly enhances the utility of the sequence for genetic mapping, comparative genomics, gene annotation and isolation, and more general studies on the biology of wheat.
Significance Statement
This new release of bread wheat cv. Chinese Spring reference genome sequence, IWGSC RefSeq v2.1, features correction of assembly errors affecting approximately 10% of the prior IWGSC RefSeq v1.0 release using genome‐wide optical maps and filling of gaps with single‐molecule long‐reads as well as incorporating re‐annotation of TEs and re‐computation of gene coordinates. These refinements enhance the sequence utility for breeding and research applications.
Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large ...genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii , the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete.
Powdery mildew, caused by Blumeria graminis f. sp. tritici (Bgt), is one of the most destructive diseases that pose a great threat to wheat production. Wheat landraces represent a rich source of ...powdery mildew resistance. Here, we report the map-based cloning of powdery mildew resistance gene Pm24 from Chinese wheat landrace Hulutou. It encodes a tandem kinase protein (TKP) with putative kinase-pseudokinase domains, designated WHEAT TANDEM KINASE 3 (WTK3). The resistance function of Pm24 was validated by transgenic assay, independent mutants, and allelic association analyses. Haplotype analysis revealed that a rare 6-bp natural deletion of lysine-glycine codons, endemic to wheat landraces of Shaanxi Province, China, in the kinase I domain (Kin I) of WTK3 is critical for the resistance function. Transgenic assay of WTK3 chimeric variants revealed that only the specific two amino acid deletion, rather than any of the single or more amino acid deletions, in the Kin I of WTK3 is responsible for gaining the resistance function of WTK3 against the Bgt fungus.
Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, ...whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence.
An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated.
An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml).
• Powdery mildew poses severe threats to wheat production. The most sustainable way to control this disease is through planting resistant cultivars.
• We report the map-based cloning of the powdery ...mildew resistance allele Pm5e from a Chinese wheat landrace. We applied a two-step bulked segregant RNA sequencing (BSR-Seq) approach in developing tightly linked or co-segregating markers to Pm5e. The first BSR-Seq used phenotypically contrasting bulks of recombinant inbred lines (RILs) to identify Pm5e-linked markers. The second BSR-Seq utilized bulks of genetic recombinants screened from a fine-mapping population to precisely quantify the associated genomic variation in the mapping interval, and identified the Pm5e candidate genes.
• The function of Pm5e was validated by transgenic assay, loss-of-function mutants and haplotype association analysis. Pm5e encodes a nucleotide-binding domain leucine-rich-repeatcontaining (NLR) protein. A rare nonsynonymous single nucleotide variant (SNV) within the C-terminal leucine rich repeat (LRR) domain is responsible for the gain of powdery mildew resistance function of Pm5e, an allele endemic to wheat landraces of Shaanxi province of China.
• Results from this study demonstrate the value of landraces in discovering useful genes for modern wheat breeding. The key SNV associated with powdery mildew resistance will be useful for marker-assisted selection of Pm5e in wheat breeding programs.
Key message
Comparison of genome sequences of wild emmer wheat and
Aegilops tauschii
suggests a novel scenario of the evolution of rearranged wheat chromosomes 4A, 5A, and 7B.
Past research suggested ...that wheat chromosome 4A was subjected to a reciprocal translocation T(4AL;5AL)1 that occurred in the diploid progenitor of the wheat A subgenome and to three major rearrangements that occurred in polyploid wheat: pericentric inversion Inv(4AS;4AL)1, paracentric inversion Inv(4AL;4AL)1, and reciprocal translocation T(4AL;7BS)1. Gene collinearity along the pseudomolecules of tetraploid wild emmer wheat (
Triticum turgidum
ssp.
dicoccoides,
subgenomes AABB) and diploid
Aegilops tauschii
(genomes DD) was employed to confirm these rearrangements and to analyze the breakpoints. The exchange of distal regions of chromosome arms 4AS and 4AL due to pericentric inversion Inv(4AS;4AL)1 was detected, and breakpoints were validated with an optical Bionano genome map. Both breakpoints contained satellite DNA. The breakpoints of reciprocal translocation T(4AL;7BS)1 were also found. However, the breakpoints that generated paracentric inversion Inv(4AL;4AL)1 appeared to be collocated with the 4AL breakpoints that had produced Inv(4AS;4AL)1 and T(4AL;7BS)1. Inv(4AS;4AL)1, Inv(4AL;4AL)1, and T(4AL;7BS)1 either originated sequentially, and Inv(4AL;4AL)1 was produced by recurrent chromosome breaks at the same breakpoints that generated Inv(4AS;4AL)1 and T(4AL;7BS)1, or Inv(4AS;4AL)1, Inv(4AL;4AL)1, and T(4AL;7BS)1 originated simultaneously. We prefer the latter hypothesis since it makes fewer assumptions about the sequence of events that produced these chromosome rearrangements.
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive ...nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.
The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii , the ...wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions.
Wild emmer (
ssp.
) is the progenitor of all modern cultivated tetraploid wheat. Its genome is large (> 10 Gb) and contains over 80% repeated sequences. The successful whole-genome-shotgun assembly ...of the wild emmer (accession Zavitan) genome sequence (WEW_v1.0) was an important milestone for wheat genomics. In an effort to improve this assembly, an optical map of accession Zavitan was constructed using Bionano Direct Label and Stain (DLS) technology. The map spanned 10.4 Gb. This map and another map produced earlier by us with the Bionano's Nick Label Repair and Stain (NLRS) technology were used to improve the current wild emmer assembly. The WEW_v1.0 assembly consisted of 151,912 scaffolds. Of them, 3,102 could be confidently aligned on the optical maps. Forty-seven were chimeric. They were disjoined and new scaffolds were assembled with the aid of the optical maps. The total number of scaffolds was reduced from 151,912 to 149,252 and N50 increased from 6.96 Mb to 72.63 Mb. Of the 149,252 scaffolds, 485 scaffolds, which accounted for 97% of the total genome length, were aligned and oriented on genetic maps, and new WEW_v2.0 pseudomolecules were constructed. The new pseudomolecules included 333 scaffolds (68.51 Mb) which were originally unassigned, 226 scaffolds (554.84 Mb) were placed into new locations, and 332 scaffolds (394.83 Mb) were re-oriented. The improved wild emmer genome assembly is an important resource for understanding genomic modification that occurred by domestication.
Meiotic pairing between homoeologous chromosomes in polyploid wheat is inhibited by the
locus on the long arm of chromosome 5 in the B genome.
(genomes SS), the closest relative of the progenitor of ...the wheat B genome, is polymorphic for genetic suppression of
Using this polymorphism, two major suppressor loci,
and
, have been mapped in
is located in the distal, high-recombination region of the long arm of the
chromosome 3S. Its location and tight linkage to marker
makes
a suitable target for introgression into wheat. Here,
was introgressed into hexaploid bread wheat cv. Chinese Spring (CS) and from there into tetraploid durum wheat cv. Langdon (LDN). Sequential fluorescence
hybridization and genomic
hybridization showed that an
segment with
replaced the distal end of the long arm of chromosome 3A. In the CS genetic background, the chromosome induced homoeologous chromosome pairing in interspecific hybrids with
but not in progenies from crosses involving alien disomic substitution lines. In the LDN genetic background, the chromosome induced homoeologous chromosome pairing in both interspecific hybrids and progenies from crosses involving alien disomic substitution lines. We conclude that the recombined chromosome harbors
but its expression requires expression of complementary gene that is present in LDN but absent in CS. We suggest that it is unlikely that
and
, a paralog of
located on wheat chromosomes 3A and 3B and
chromosome 3D, are equivalent. The utility of
for induction of recombination between homoeologous chromosomes in wheat is illustrated.