Miniature inverted-repeat transposable elements (MITEs) are prevalent in eukaryotic species including plants. MITE families vary dramatically and usually cannot be identified based on homology. In ...this study, we de novo identified MITEs from 41 plant species, using computer programs MITE Digger, MITE-Hunter and/or Repetitive Sequence with Precise Boundaries (RSPB). MITEs were found in all, but one (Cyanidioschyzon merolae), species. Combined with the MITEs identified previously from the rice genome, >2.3 million sequences from 3527 MITE families were obtained from 41 plant species. In general, higher plants contain more MITEs than lower plants, with a few exceptions such as papaya, with only 538 elements. The largest number of MITEs is found in apple, with 237 302 MITE sequences. The number of MITE sequences in a genome is significantly correlated with genome size. A series of databases (plant MITE databases, P-MITE), available online at http://pmite.hzau.edu.cn/django/mite/, was constructed to host all MITE sequences from the 41 plant genomes. The databases are available for sequence similarity searches (BLASTN), and MITE sequences can be downloaded by family or by genome. The databases can be used to study the origin and amplification of MITEs, MITE-derived small RNAs and roles of MITEs on gene and genome evolution.
Key message
We identified the loss of
BoFLC
gene as the cause of non-vernalization requirement in
B. oleracea
.
Our developed codominant marker of
BoFLC
gene can be used for breeding program of
B. ...oleracea
crops
.
Many species of the Brassicaceae family, including some Brassica crops, require vernalization to avoid pre-winter flowering. Vernalization is an unfavorable trait for Chinese kale (
Brassica oleracea
var.
chinensis Lei
), a stem vegetable, and therefore it has been lost during its domestication/breeding process. To reveal the genetics of vernalization variation, we constructed an F
2
population through crossing a Chinese kale (a non-vernalization crop) with a kale (a vernalization crop). Using bulked segregant analysis (BSA) and RNA-seq, we identified one major quantitative trait locus (QTL) controlling vernalization and fine-mapped it to a region spanning 80 kb. Synteny analysis and PCR-based sequencing results revealed that compared to that of the kale parent, the candidate region of the Chinese kale parent lost a 9,325-bp fragment containing
FLC
homolog (
BoFLC
). In addition to the
BoFLC
gene, there are four other
FLC
homologs in the genome of
B. oleracea
, including
Bo3g005470
,
Bo3g024250
,
Bo9g173370,
and
Bo9g173400
. The qPCR analysis showed that the
BoFLC
had the highest expression among the five members of the
FLC
family. Considering the low expression levels of the four paralogs of
BoFLC
, we speculate that its paralogs cannot compensate the function of the lost
BoFLC,
therefore the presence/absence (PA) polymorphism of
BoFLC
determines the vernalization variation. Based on the PA polymorphism of
BoFLC
, we designed a codominant marker for the vernalization trait, which can be used for breeding programs of
B. oleracea
crops.
Most disease resistance genes encode nucleotide-binding-site (NBS) and leucine-rich-repeat (LRR) domains, and the NBS-LRR encoding genes are often referred to as R genes. Using newly developed ...approach, 478, 485, 1,194, 1,665, 2,042 and 374 R genes were identified from the genomes of tomato Heinz1706, wild tomato LA716, potato DM1-3, pepper Zunla-1 and wild pepper Chiltepin and tobacco TN90, respectively. The majority of R genes from Solanaceae were grouped into 87 subfamilies, including 16 TIR-NBS-LRR (TNL) and 71 non-TNL subfamilies. Each subfamily was annotated manually, including identification of intron/exon structure and intron phase. Interestingly, TNL subfamilies have similar intron phase patterns, while the non-TNL subfamilies have diverse intron phase due to frequent gain of introns. Prevalent presence/absence polymorphic R gene loci were found among Solanaceae species, and an integrated map with 427 R loci was constructed. The pepper genome (2,042 in Chiltepin) has at least four times of R genes as in tomato (478 in Heinz1706). The high number of R genes in pepper genome is due to the amplification of R genes in a few subfamilies, such as the Rpi-blb2 and BS2 subfamilies. The mechanism underlying the variation of R gene number among different plant genomes is discussed.
Different horticultural types of lettuce exhibit tremendous morphological variation. However, the molecular basis for domestication and divergence among the different horticultural types of lettuce ...remains unknown. Here, we report the RNA sequencing of 240 lettuce accessions sampled from the major horticultural types and wild relatives, generating 1.1 million single-nucleotide polymorphisms (SNPs). Demographic modeling indicates that there was a single domestication event for lettuce. We identify a list of regions as putative selective sweeps that occurred during domestication and divergence, respectively. Genome-wide association studies (GWAS) identify 5311 expression quantitative trait loci (eQTL) regulating the expression of 4105 genes, including nine eQTLs regulating genes associated with flavonoid biosynthesis. GWAS for leaf color detects six candidate loci responsible for the variation of anthocyanins in lettuce leaves. Our study provides a comprehensive understanding of the domestication and the accumulation of anthocyanins in lettuce, and will facilitate the breeding of cultivars with improved nutritional value.
Abstract
Betula L. (birch) is a pioneer hardwood tree species with ecological, economic, and evolutionary importance in the Northern Hemisphere. We sequenced the Betula platyphylla genome and ...assembled the sequences into 14 chromosomes. The Betula genome lacks evidence of recent whole-genome duplication and has the same paleoploidy level as Vitis vinifera and Prunus mume. Phylogenetic analysis of lignin pathway genes coupled with tissue-specific expression patterns provided clues for understanding the formation of higher ratios of syringyl to guaiacyl lignin observed in Betula species. Our transcriptome analysis of leaf tissues under a time-series cold stress experiment revealed the presence of the MEKK1–MKK2–MPK4 cascade and six additional mitogen-activated protein kinases that can be linked to a gene regulatory network involving many transcription factors and cold tolerance genes. Our genomic and transcriptome analyses provide insight into the structures, features, and evolution of the B. platyphylla genome. The chromosome-level genome and gene resources of B. platyphylla obtained in this study will facilitate the identification of important and essential genes governing important traits of trees and genetic improvement of B. platyphylla.
The proper use of resistance genes (R genes) requires a comprehensive understanding of their genomics and evolution. We analyzed genes encoding nucleotide-binding sites and leucine-rich repeats in ...the genomes of rice (Oryza sativa), maize (Zea mays), sorghum (Sorghum bicolor), and Brachypodium distachyon. Frequent deletions and translocations of R genes generated prevalent presence/absence polymorphism between different accessions/species. The deletions were caused by unequal crossover, homologous repair, nonhomologous repair, or other unknown mechanisms. R gene loci identified from different genomes were mapped onto the chromosomes of rice cv Nipponbare using comparative genomics, resulting in an integrated map of 495 R loci. Sequence analysis of R genes from the partially sequenced genomes of an African rice cultivar and 10 wild accessions suggested that there are many additional R gene lineages in the AA genome of Oryza. The R genes with chimeric structures (termed type I R genes) are diverse in different rice accessions but only account for 5.8% of all R genes in the Nipponbare genome, m contrast, the vast majority of R genes in the rice genome are type Ð R genes, which are highly conserved in different accessions. Surprisingly, pseudogene-causing mutations in some type II lineages are often conserved, indicating that their conservations were not due to their functions. Functional R genes cloned from rice so far have more type II R genes than type I R genes, but type I R genes are predicted to contribute considerable diversity in wild species. Type I R genes tend to reduce the microsynteny of their flanking regions significantly more than type II R genes, and their flanking regions have slightly but significantly lower G/C content than those of type II R genes.
Differentiation of secondary metabolite profiles in closely related plant species provides clues for unravelling biosynthetic pathways and regulatory circuits, an area that is still ...underinvestigated. Cucurbitacins, a group of bitter and highly oxygenated tetracyclic triterpenes, are mainly produced by the plant family Cucurbitaceae. These compounds have similar structures, but differ in their antitumour activities and ecophysiological roles. By comparative analyses of the genomes of cucumber, melon and watermelon, we uncovered conserved syntenic loci encoding metabolic genes for distinct cucurbitacins. Characterization of the cytochrome P450s (CYPs) identified from these loci enabled us to unveil a novel multi-oxidation CYP for the tailoring of the cucurbitacin core skeleton as well as two other CYPs responsible for the key structural variations among cucurbitacins C, B and E. We also discovered a syntenic gene cluster of transcription factors that regulates the tissue-specific biosynthesis of cucurbitacins and may confer the loss of bitterness phenotypes associated with convergent domestication of wild cucurbits. This study illustrates the potential to exploit comparative genomics to identify enzymes and transcription factors that control the biosynthesis of structurally related yet unique natural products.
The Arabidopsis genome contains ∼200 genes that encode proteins with similarity to the nucleotide binding site and other domains characteristic of plant resistance proteins. Through a reiterative ...process of sequence analysis and reannotation, we identified 149 NBS-LRR-encoding genes in the Arabidopsis (ecotype Columbia) genomic sequence. Fifty-six of these genes were corrected from earlier annotations. At least 12 are predicted to be pseudogenes. As described previously, two distinct groups of sequences were identified: those that encoded an N-terminal domain with Toll/Interleukin-1 Receptor homology (TIR-NBS-LRR, or TNL), and those that encoded an N-terminal coiled-coil motif (CC-NBS-LRR, or CNL). The encoded proteins are distinct from the 58 predicted adapter proteins in the previously described TIR-X, TIR-NBS, and CC-NBS groups. Classification based on protein domains, intron positions, sequence conservation, and genome distribution defined four subgroups of CNL proteins, eight subgroups of TNL proteins, and a pair of divergent NL proteins that lack a defined N-terminal motif. CNL proteins generally were encoded in single exons, although two subclasses were identified that contained introns in unique positions. TNL proteins were encoded in modular exons, with conserved intron positions separating distinct protein domains. Conserved motifs were identified in the LRRs of both CNL and TNL proteins. In contrast to CNL proteins, TNL proteins contained large and variable C-terminal domains. The extant distribution and diversity of the NBS-LRR sequences has been generated by extensive duplication and ectopic rearrangements that involved segmental duplications as well as microscale events. The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.
Fruit flesh color in watermelon (Citrullus lanatus) is a great index for evaluating the appearance quality and a key contributor influencing consumers' preferences. But the molecular mechanism of ...this intricate trait remains largely unknown. Here, the carotenoids and transcriptome dynamics during the fruit development of cultivated watermelon with five different flesh colors were analyzed.
A total of 13 carotenoids and 16,781 differentially expressed genes (DEGs), including 1295 transcription factors (TFs), were detected in five watermelon genotypes during the fruit development. The comprehensive accumulation patterns of carotenoids were closely related to flesh color. A number of potential structural genes and transcription factors were found to be associated with the carotenoid biosynthesis pathway using comparative transcriptome analysis. The differentially expressed genes were divided into six subclusters and distributed in different GO terms and metabolic pathways. Furthermore, we performed weighted gene co-expression network analysis and predicted the hub genes in six main modules determining carotenoid contents. Cla018406 (a chaperone protein dnaJ-like protein) may be a candidate gene for β-carotene accumulation and highly expressed in orange flesh-colored fruit. Cla007686 (a zinc finger CCCH domain-containing protein) was highly expressed in the red flesh-colored watermelon, maybe a key regulator of lycopene accumulation. Cla003760 (membrane protein) and Cla021635 (photosystem I reaction center subunit II) were predicted to be the hub genes and may play an essential role in yellow flesh formation.
The composition and contents of carotenoids in five watermelon genotypes vary greatly. A series of candidate genes were revealed through combined analysis of metabolites and transcriptome. These results provide an important data resource for dissecting candidate genes and molecular basis governing flesh color formation in watermelon fruit.