Microsatellite (simple sequence repeat - SSR) and single nucleotide polymorphism (SNP) markers are two types of important genetic markers useful in genetic mapping and genotyping. Often, large-scale ...genomic research projects require high-throughput computer-assisted primer design. Numerous such web-based or standard-alone programs for PCR primer design are available but vary in quality and functionality. In particular, most programs lack batch primer design capability. Such a high-throughput software tool for designing SSR flanking primers and SNP genotyping primers is increasingly demanded.
A new web primer design program, BatchPrimer3, is developed based on Primer3. BatchPrimer3 adopted the Primer3 core program as a major primer design engine to choose the best primer pairs. A new score-based primer picking module is incorporated into BatchPrimer3 and used to pick position-restricted primers. BatchPrimer3 v1.0 implements several types of primer designs including generic primers, SSR primers together with SSR detection, and SNP genotyping primers (including single-base extension primers, allele-specific primers, and tetra-primers for tetra-primer ARMS PCR), as well as DNA sequencing primers. DNA sequences in FASTA format can be batch read into the program. The basic information of input sequences, as a reference of parameter setting of primer design, can be obtained by pre-analysis of sequences. The input sequences can be pre-processed and masked to exclude and/or include specific regions, or set targets for different primer design purposes as in Primer3Web and primer3Plus. A tab-delimited or Excel-formatted primer output also greatly facilitates the subsequent primer-ordering process. Thousands of primers, including wheat conserved intron-flanking primers, wheat genome-specific SNP genotyping primers, and Brachypodium SSR flanking primers in several genome projects have been designed using the program and validated in several laboratories.
BatchPrimer3 is a comprehensive web primer design program to develop different types of primers in a high-throughput manner. Additional methods of primer design can be easily integrated into future versions of BatchPrimer3. The program with source code and thousands of PCR and sequencing primers designed for wheat and Brachypodium are accessible at http://wheat.pw.usda.gov/demos/BatchPrimer3/.
The oat seed storage proteins are mainly composed of two classes: the globulins and avenins. Among the major cereals, the globulins are the major seed protein class in rice and oats, and along with ...the higher protein content of oats is the basis for the relative higher nutrition content in oats compared to the other cereals. The second major class of oat seed proteins is the avenins; also classified as prolamins - seed proteins high in proline and glutamine amino acids. The prolamins are associated with celiac disease, an autoimmune disorder of the gastrointestinal tract. In spite of their importance, neither the oat globulins nor the avenins have been completely analyzed and described for any single germplasm.
Using available EST resources for a single hexaploid oat cultivar, the spectrum of avenin and globulin sequences are described for the gene coding regions and the derived protein sequences. The nine unique avenin sequences are suggested to be divided into 3-4 distinct subclasses distributed in the hexaploid genome. The globulins from the same germplasm include 24 distinct sequences. Variation in globulin size results mainly from a glutamine-rich domain, similar to as in the avenins, and to variation in the C-terminal sequence domain. Two globulin genes have premature stop codons that shorten the resulting polypeptides by 9 and 17 amino acids, and eight of the globulin sequences form a branch of the globulins not previously reported.
A more complete description of the major oat seed proteins should allow a more thorough analysis of their contributions to those oat seed characteristics related to nutritional value, evolutionary history, and celiac disease association.
Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, ...whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence.
An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated.
An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml).
In higher plants, inorganic nitrogen is assimilated via the glutamate synthase cycle or GS-GOGAT pathway. GOGAT enzyme occurs in two distinct forms that use NADH (NADH-GOGAT) or Fd (Fd-GOGAT) as ...electron carriers. The goal of the present study was to characterize wheat Fd-GOGAT genes and to assess the linkage with grain protein content (GPC), an important quantitative trait controlled by multiple genes.
We report the complete genomic sequences of the three homoeologous A, B and D Fd-GOGAT genes from hexaploid wheat (Triticum aestivum) and their localization and characterization. The gene is comprised of 33 exons and 32 introns for all the three homoeologues genes. The three genes show the same exon/intron number and size, with the only exception of a series of indels in intronic regions. The partial sequence of the Fd-GOGAT gene located on A genome was determined in two durum wheat (Triticum turgidum ssp. durum) cvs Ciccio and Svevo, characterized by different grain protein content. Genomic differences allowed the gene mapping in the centromeric region of chromosome 2A. QTL analysis was conducted in the Svevo×Ciccio RIL mapping population, previously evaluated in 5 different environments. The study co-localized the Fd-GOGAT-A gene with the marker GWM-339, identifying a significant major QTL for GPC.
The wheat Fd-GOGAT genes are highly conserved; both among the three homoeologous hexaploid wheat genes and in comparison with other plants. In durum wheat, an association was shown between the Fd-GOGAT allele of cv Svevo with increasing GPC - potentially useful in breeding programs.
The utility of mining DNA sequence data to understand the structure and expression of cereal prolamin genes is demonstrated by the identification of a new class of wheat prolamins. This previously ...unrecognized wheat prolamin class, given the name δ-gliadins, is the most direct ortholog of barley γ3-hordeins. Phylogenetic analysis shows that the orthologous δ-gliadins and γ3-hordeins form a distinct prolamin branch that existed separate from the γ-gliadins and γ-hordeins in an ancestral Triticeae prior to the branching of wheat and barley. The expressed δ-gliadins are encoded by a single gene in each of the hexaploid wheat genomes. This single δ-gliadin/γ3-hordein ortholog may be a general feature of the Triticeae tribe since examination of ESTs from three barley cultivars also confirms a single γ3-hordein gene. Analysis of ESTs and cDNAs shows that the genes are expressed in at least five hexaploid wheat cultivars in addition to diploids Triticum monococcum and Aegilops tauschii . The latter two sequences also allow assignment of the δ-gliadin genes to the A and D genomes, respectively, with the third sequence type assumed to be from the B genome. Two wheat cultivars for which there are sufficient ESTs show different patterns of expression, i.e., with cv Chinese Spring expressing the genes from the A and B genomes, while cv Recital has ESTs from the A and D genomes. Genomic sequences of Chinese Spring show that the D genome gene is inactivated by tandem premature stop codons. A fourth δ-gliadin sequence occurs in the D genome of both Chinese Spring and Ae. tauschii , but no ESTs match this sequence and limited genomic sequences indicates a pseudogene containing frame shifts and premature stop codons. Sequencing of BACs covering a 3 Mb region from Ae. tauschii locates the δ-gliadin gene to the complex Gli-1 plus Glu-3 region on chromosome 1.
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive ...nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.
A genome-wide assessment of nucleotide diversity in a polyploid species must minimize the inclusion of homoeologous sequences into diversity estimates and reliably allocate individual haplotypes into ...their respective genomes. The same requirements complicate the development and deployment of single nucleotide polymorphism (SNP) markers in polyploid species. We report here a strategy that satisfies these requirements and deploy it in the sequencing of genes in cultivated hexaploid wheat (Triticum aestivum, genomes AABBDD) and wild tetraploid wheat (Triticum turgidum ssp. dicoccoides, genomes AABB) from the putative site of wheat domestication in Turkey. Data are used to assess the distribution of diversity among and within wheat genomes and to develop a panel of SNP markers for polyploid wheat.
Nucleotide diversity was estimated in 2114 wheat genes and was similar between the A and B genomes and reduced in the D genome. Within a genome, diversity was diminished on some chromosomes. Low diversity was always accompanied by an excess of rare alleles. A total of 5,471 SNPs was discovered in 1791 wheat genes. Totals of 1,271, 1,218, and 2,203 SNPs were discovered in 488, 463, and 641 genes of wheat putative diploid ancestors, T. urartu, Aegilops speltoides, and Ae. tauschii, respectively. A public database containing genome-specific primers, SNPs, and other information was constructed. A total of 987 genes with nucleotide diversity estimated in one or more of the wheat genomes was placed on an Ae. tauschii genetic map, and the map was superimposed on wheat deletion-bin maps. The agreement between the maps was assessed.
In a young polyploid, exemplified by T. aestivum, ancestral species are the primary source of genetic diversity. Low effective recombination due to self-pollination and a genetic mechanism precluding homoeologous chromosome pairing during polyploid meiosis can lead to the loss of diversity from large chromosomal regions. The net effect of these factors in T. aestivum is large variation in diversity among genomes and chromosomes, which impacts the development of SNP markers and their practical utility. Accumulation of new mutations in older polyploid species, such as wild emmer, results in increased diversity and its more uniform distribution across the genome.
The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii , the ...wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions.
Nitrogen uptake and the efficient absorption and metabolism of nitrogen are essential elements in attempts to breed improved cereal cultivars for grain or silage production. One of the enzymes ...related to nitrogen metabolism is glutamine-2-oxoglutarate amidotransferase (GOGAT). Together with glutamine synthetase (GS), GOGAT maintains the flow of nitrogen from NH 4 + into glutamine and glutamate, which are then used for several aminotransferase reactions during amino acid synthesis. The aim of the present work was to identify and analyse the structure of wheat NADH-GOGAT genomic sequences, and study the expression in two durum wheat cultivars characterized by low and high kernel protein content. The genomic sequences of the three homoeologous A, B and D NADH-GOGAT genes were obtained for hexaploid Triticum aestivum and the tetraploid A and B genes of Triticum turgidum ssp. durum . Analysis of the gene sequences indicates that all wheat NADH-GOGAT genes are composed of 22 exons and 21 introns. The three hexaploid wheat homoeologous genes have high conservation of sequence except intron 13 which shows differences in both length and sequence. A comparative analysis of sequences among di- and mono-cotyledonous plants shows both regions of high conservation and of divergence. qRT-PCR performed with the two durum wheat cvs Svevo and Ciccio (characterized by high and low protein content, respectively) indicates different expression levels of the two NADH-GOGAT-3A and NADH-GOGAT-3B genes. The three hexaploid wheat homoeologous NADH-GOGAT gene sequences are highly conserved – consistent with the key metabolic role of this gene. However, the dicot and monocot amino acid sequences show distinctive patterns, particularly in the transit peptide, the exon 16–17 junction, and the C-terminus. The lack of conservation in the transit peptide may indicate subcellular differences between the two plant divisions - while the sequence conservation within enzyme functional domains remains high. Higher expression levels of NADH-GOGAT are associated with higher grain protein content in two durum wheats.
The spectrum of B-hordein prolamins and genes in the single barley cultivar Barke is described from an in silico analysis of 1452 B-hordein ESTs and available genomic DNA. Eleven unique B-hordein ...proteins are derived from EST contigs. Ten contigs encode apparent full-length B-hordeins and the eleventh contains a premature stop codon that will lead to a truncated B-hordein. The 11 sequences are placed within the two previously described classes, i.e., the B1- and B3-type B-hordeins. The number of ESTs assigned to each sequence is used as an estimate of relative gene transcription and expression. Three of the sequences account for 79% of the total ESTs, with one sequence comprises 32% of the total ESTs and has a variant C-terminus caused by an undefined sequence change history near the 3′ coding terminus. The 70× difference in EST distribution among sequences points to the importance of understanding differential rates of expression within closely related gene families. Analysis of available genomic sequences confirms the EST assembly and reveals one full-length and two partial sequences of pseudogenes as evidenced by no matching ESTs for the sequences and premature stop codons and frame shifts.