Targeted insertion of transgenes at pre-determined plant genomic safe harbors provides a desirable alternative to insertions at random sites achieved through conventional methods. Most existing cases ...of targeted gene insertion in plants have either relied on the presence of a selectable marker gene in the insertion cassette or occurred at low frequency with relatively small DNA fragments (<1.8 kb). Here, we report the use of an optimized CRISPR-Cas9-based method to achieve the targeted insertion of a 5.2 kb carotenoid biosynthesis cassette at two genomic safe harbors in rice. We obtain marker-free rice plants with high carotenoid content in the seeds and no detectable penalty in morphology or yield. Whole-genome sequencing reveals the absence of off-target mutations by Cas9 in the engineered plants. These results demonstrate targeted gene insertion of marker-free DNA in rice using CRISPR-Cas9 genome editing, and offer a promising strategy for genetic improvement of rice and other crops.
Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop ...species (known as the 'pan-genome'
). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions
. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley-comprising landraces, cultivars and a wild barley-that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding.
Summary
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small ...genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high‐quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10‐fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large‐scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA‐seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement.
Significance Statement
An improved reference genome assembly, genome annotation, and transcriptome atlas provide fundamental resources for basic and applied research in the agriculturally important plant Sorghum bicolor. These resources enabled the identification of subtelomeric tandem repeats specific to sorghum, revealed patterns of genetic variation accumulation in the genome, and identified a set of kinases putatively involved in regulating tissue identity.
The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well ...as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches.
Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%.
The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.
DNA methylation is an important feature of plant epigenomes, involved in the formation of heterochromatin and affecting gene expression. Extensive variation of DNA methylation patterns within a ...species has been uncovered from studies of natural variation. However, the extent to which DNA methylation varies between flowering plant species is still unclear. To understand the variation in genomic patterning of DNA methylation across flowering plant species, we compared single base resolution DNA methylomes of 34 diverse angiosperm species.
By analyzing whole-genome bisulfite sequencing data in a phylogenetic context, it becomes clear that there is extensive variation throughout angiosperms in gene body DNA methylation, euchromatic silencing of transposons and repeats, as well as silencing of heterochromatic transposons. The Brassicaceae have reduced CHG methylation levels and also reduced or loss of CG gene body methylation. The Poaceae are characterized by a lack or reduction of heterochromatic CHH methylation and enrichment of CHH methylation in genic regions. Furthermore, low levels of CHH methylation are observed in a number of species, especially in clonally propagated species.
These results reveal the extent of variation in DNA methylation in angiosperms and show that DNA methylation patterns are broadly a reflection of the evolutionary and life histories of plant species.
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping ...approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Sugarcane (Saccharum spp.) is a major crop for sugar and bioenergy production. Its highly polyploid, aneuploid, heterozygous, and interspecific genome poses major challenges for producing a reference ...sequence. We exploited colinearity with sorghum to produce a BAC-based monoploid genome sequence of sugarcane. A minimum tiling path of 4660 sugarcane BAC that best covers the gene-rich part of the sorghum genome was selected based on whole-genome profiling, sequenced, and assembled in a 382-Mb single tiling path of a high-quality sequence. A total of 25,316 protein-coding gene models are predicted, 17% of which display no colinearity with their sorghum orthologs. We show that the two species, S. officinarum and S. spontaneum, involved in modern cultivars differ by their transposable elements and by a few large chromosomal rearrangements, explaining their distinct genome size and distinct basic chromosome numbers while also suggesting that polyploidization arose in both lineages after their divergence.
The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low ...concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with alleles that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO
. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.
Basidiomycota (basidiomycetes) make up 32% of the described fungi and include most wood-decaying species, as well as pathogens and mutualistic symbionts. Wood-decaying basidiomycetes have typically ...been classified as either white rot or brown rot, based on the ability (in white rot only) to degrade lignin along with cellulose and hemicellulose. Prior genomic comparisons suggested that the two decay modes can be distinguished based on the presence or absence of ligninolytic class II peroxidases (PODs), as well as the abundance of enzymes acting directly on crystalline cellulose (reduced in brown rot). To assess the generality of the white-rot/brown-rot classification paradigm, we compared the genomes of 33 basidiomycetes, including four newly sequenced wood decayers, and performed phylogenetically informed principal-components analysis (PCA) of a broad range of gene families encoding plant biomass-degrading enzymes. The newly sequenced Botryobasidium botryosum and Jaapia argillacea genomes lack PODs but possess diverse enzymes acting on crystalline cellulose, and they group close to the model white-rot species Phanerochaete chrysosporium in the PCA. Furthermore, laboratory assays showed that both B. botryosum and J. argillacea can degrade all polymeric components of woody plant cell walls, a characteristic of white rot. We also found expansions in reducing polyketide synthase genes specific to the brown-rot fungi. Our results suggest a continuum rather than a dichotomy between the white-rot and brown-rot modes of wood decay. A more nuanced categorization of rot types is needed, based on an improved understanding of the genomics and biochemistry of wood decay.