•Long read sequencing has improved genome assembly by an order of magnitude.•New genome assembly algorithms leverage long, error prone reads to untangle complex sequences.•Physical mapping ...technologies such as Hi-C and optical mapping enable chromosome scale assembly.•Long reads allow accurate assembly and phasing of polyploid and heterozygotic genomes, but challenges remain.•De novo assembly will replace re-sequencing and multiple high-quality references are needed for each species.
Plant genomes span several orders of magnitude in size, vary in levels of ploidy and heterozygosity, and contain old and recent bursts of transposable elements, which render them challenging but interesting to assemble. Recent advances in single molecule sequencing and physical mapping technologies have enabled high-quality, chromosome scale assemblies of plant species with increasing complexity and size. Single molecule reads can now exceed megabases in length, providing unprecedented opportunities to untangle genomic regions missed by short read technologies. However, polyploid and heterozygous plant genomes are still difficult to assemble but provide opportunities for new tools and approaches. Haplotype phasing, structural variant analysis and de novo pan-genomics are the emerging frontiers in plant genome assembly.
Plants with facultative crassulacean acid metabolism (CAM) maximize performance through utilizing C3 or C4 photosynthesis under ideal conditions while temporally switching to CAM under water stress ...(drought). While genome-scale analyses of constitutive CAM plants suggest that time of day networks are shifted, or phased to the evening compared to C3, little is known for how the shift from C3 to CAM networks is modulated in drought induced CAM. Here we generate a draft genome for the drought-induced CAM-cycling species Sedum album. Through parallel sampling in well-watered (C3) and drought (CAM) conditions, we uncover a massive rewiring of time of day expression and a CAM and stress-specific network. The core circadian genes are expanded in S. album and under CAM induction, core clock genes either change phase or amplitude. While the core clock cis-elements are conserved in S. album, we uncover a set of novel CAM and stress specific cis-elements consistent with our finding of rewired co-expression networks. We identified shared elements between constitutive CAM and CAM-cycling species and expression patterns unique to CAM-cycling S. album. Together these results demonstrate that drought induced CAM-cycling photosynthesis evolved through the mobilization of a stress-specific, time of day network, and not solely the phasing of existing C3 networks. These results will inform efforts to engineer water use efficiency into crop plants for growth on marginal land.
•NGS speed and capacity enable over 100 published plant genomes.•Underserved specialty and orphan crop genomic resources grow due to low cost NGS.•Double haploid and diploid ancestors key to sequence ...complex plant genomes.•Polyploidy, heterozygosity and repeats complicate plant genome assembly but also underlie key agronomic traits.•Plant ENCODE, resequencing panels and temporal RNAseq comprise the next era of plant genomics.
The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes.
Abstract
The circadian clock is conserved at both the level of transcriptional networks as well as core genes in plants, ensuring that biological processes are phased to the correct time of day. In ...the model plant Arabidopsis (Arabidopsis thaliana), the core circadian SHAQKYF-type-MYB (sMYB) genes CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) and REVEILLE (RVE4) show genetic linkage with PSEUDO-RESPONSE REGULATOR 9 (PRR9) and PRR7, respectively. Leveraging chromosome-resolved plant genomes and syntenic ortholog analysis enabled tracing this genetic linkage back to Amborella trichopoda, a sister lineage to the angiosperm, and identifying an additional evolutionarily conserved genetic linkage in light signaling genes. The LHY/CCA1–PRR5/9, RVE4/8–PRR3/7, and PIF3–PHYA genetic linkages emerged in the bryophyte lineage and progressively moved within several genes of each other across an array of angiosperm families representing distinct whole-genome duplication and fractionation events. Soybean (Glycine max) maintained all but two genetic linkages, and expression analysis revealed the PIF3–PHYA linkage overlapping with the E4 maturity group locus was the only pair to robustly cycle with an evening phase, in contrast to the sMYB–PRR morning and midday phase. While most monocots maintain the genetic linkages, they have been lost in the economically important grasses (Poaceae), such as maize (Zea mays), where the genes have been fractionated to separate chromosomes and presence/absence variation results in the segregation of PRR7 paralogs across heterotic groups. The environmental robustness model is put forward, suggesting that evolutionarily conserved genetic linkages ensure superior microhabitat pollinator synchrony, while wide-hybrids or unlinking the genes, as seen in the grasses, result in heterosis, adaptation, and colonization of new ecological niches.
The genetic linkage of the core circadian clock and light signaling genes coincides with the rise to dominance of flowering plants and may explain environment-specific growth as well as heterosis.
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current ...next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
Teff (Eragrostis tef) is a cornerstone of food security in the Horn of Africa, where it is prized for stress resilience, grain nutrition, and market value. Here, we report a chromosome-scale assembly ...of allotetraploid teff (variety Dabbi) and patterns of subgenome dynamics. The teff genome contains two complete sets of homoeologous chromosomes, with most genes maintaining as syntenic gene pairs. TE analysis allows us to estimate that the teff polyploidy event occurred ~1.1 million years ago (mya) and that the two subgenomes diverged ~5.0 mya. Despite this divergence, we detect no large-scale structural rearrangements, homoeologous exchanges, or biased gene loss, in contrast to many other allopolyploids. The two teff subgenomes have partitioned their ancestral functions based on divergent expression across a diverse expression atlas. Together, these genomic resources will be useful for accelerating breeding of this underutilized grain crop and for fundamental insights into polyploid genome evolution.
The handheld Oxford Nanopore MinION sequencer generates ultra-long reads with minimal cost and time requirements, which makes sequencing genomes at the bench feasible. Here, we sequence the gold ...standard Arabidopsis thaliana genome (KBS-Mac-74 accession) on the bench with the MinION sequencer, and assemble the genome using typical consumer computing hardware (4 Cores, 16 Gb RAM) into chromosome arms (62 contigs with an N50 length of 12.3 Mb). We validate the contiguity and quality of the assembly with two independent single-molecule technologies, Bionano optical genome maps and Pacific Biosciences Sequel sequencing. The new A. thaliana KBS-Mac-74 genome enables resolution of a quantitative trait locus that had previously been recalcitrant to a Sanger-based BAC sequencing approach. In summary, we demonstrate that even when the purpose is to understand complex structural variation at a single region of the genome, complete genome assembly is becoming the simplest way to achieve this goal.
Circadian clocks provide an adaptive advantage through anticipation of daily and seasonal environmental changes. In plants, the central clock oscillator is regulated by several interlocking feedback ...loops. It was shown that a substantial proportion of the Arabidopsis genome cycles with phases of peak expression covering the entire day. Synchronized transcriptome cycling is driven through an extensive network of diurnal and clock-regulated transcription factors and their target cis-regulatory elements. Study of the cycling transcriptome in other plant species could thus help elucidate the similarities and differences and identify hubs of regulation common to monocot and dicot plants.
Using a combination of oligonucleotide microarrays and data mining pipelines, we examined daily rhythms in gene expression in one monocotyledonous and one dicotyledonous plant, rice and poplar, respectively. Cycling transcriptomes were interrogated under different diurnal (driven) and circadian (free running) light and temperature conditions. Collectively, photocycles and thermocycles regulated about 60% of the expressed nuclear genes in rice and poplar. Depending on the condition tested, up to one third of oscillating Arabidopsis-poplar-rice orthologs were phased within three hours of each other suggesting a high degree of conservation in terms of rhythmic gene expression. We identified clusters of rhythmically co-expressed genes and searched their promoter sequences to identify phase-specific cis-elements, including elements that were conserved in the promoters of Arabidopsis, poplar, and rice.
Our results show that the cycling patterns of many circadian clock genes are highly conserved across poplar, rice, and Arabidopsis. The expression of many orthologous genes in key metabolic and regulatory pathways is diurnal and/or circadian regulated and phased to similar times of day. Our results confirm previous findings in Arabidopsis of three major classes of cis-regulatory modules within the plant circadian network: the morning (ME, GBOX), evening (EE, GATA), and midnight (PBX/TBX/SBX) modules. Identification of identical overrepresented motifs in the promoters of cycling genes from different species suggests that the core diurnal/circadian cis-regulatory network is deeply conserved between mono- and dicotyledonous species.
A highly simplified species for genome engineering would facilitate rational design of a synthetic plant. A candidate species is the aquatic, non-grass monocot wolffia (Wolffia australiana) in the ...Lemnaceae family. Commonly known as watermeal, wolffia is a rootless ball of several thousand cells the size of a pinhead and the fastest growing plant known on Earth. Its extreme morphological reduction is coupled to transposon-mediated streamlining of its transcriptome, which represents a core set of nonredundant protein coding genes. Despite its body plan and transcriptome being highly specialized for continuous growth, wolffia retains cell types relevant to higher plants. Systems level studies with this species could enable the creation of a defined biological chassis for synthetic plant construction.
Wolffia is the smallest of duckweeds, at 1 mm diameter, with reduced morphology, lacking roots as well as vasculature but retaining key anatomical features and core pathways found in other plants, making it a potential synthetic plant biology chassis.Wolffia australiana has a minimal gene set at about 15 000 that represents a nonredundant catalog of core plant proteins, which facilitates characterization of gene function and can provide opportunities to introduce new pathways.Wolffia doubles in less than a day due in part to relaxed time-of-day (TOD) gating of growth and, since it is an aquatic plant that is partially submerged, it enables more precise manipulation and speed for experiments.Wolffia has features that make it ideal for bottom-up and top-down plant genome engineering, which could usher in a detailed description of cellular function and facilitate synthetic plant construction.
Global climate change includes rising temperatures and increased pCO2 concentrations in the ocean, with potential deleterious impacts on marine organisms. In this case study we conducted a four-week ...climate change incubation experiment, and tested the independent and combined effects of increased temperature and partial pressure of carbon dioxide (pCO2), on the microbiomes of a foundation species, the giant kelp Macrocystis pyrifera, and the surrounding water column. The water and kelp microbiome responded differently to each of the climate stressors. In the water microbiome, each condition caused an increase in a distinct microbial order, whereas the kelp microbiome exhibited a reduction in the dominant kelp-associated order, Alteromondales. The water column microbiomes were most disrupted by elevated pCO2, with a 7.3 fold increase in Rhizobiales. The kelp microbiome was most influenced by elevated temperature and elevated temperature in combination with elevated pCO2. Kelp growth was negatively associated with elevated temperature, and the kelp microbiome showed a 5.3 fold increase Flavobacteriales and a 2.2 fold increase alginate degrading enzymes and sulfated polysaccharides. In contrast, kelp growth was positively associated with the combination of high temperature and high pCO2 'future conditions', with a 12.5 fold increase in Planctomycetales and 4.8 fold increase in Rhodobacteriales. Therefore, the water and kelp microbiomes acted as distinct communities, where the kelp was stabilizing the microbiome under changing pCO2 conditions, but lost control at high temperature. Under future conditions, a new equilibrium between the kelp and the microbiome was potentially reached, where the kelp grew rapidly and the commensal microbes responded to an increase in mucus production.