Five versions of the Chlamydomonas reinhardtii reference genome have been produced over the last two decades. Here we present version 6, bringing significant advances in assembly quality and ...structural annotations. PacBio-based chromosome-level assemblies for two laboratory strains, CC-503 and CC-4532, provide resources for the plus and minus mating-type alleles. We corrected major misassemblies in previous versions and validated our assemblies via linkage analyses. Contiguity increased over ten-fold and >80% of filled gaps are within genes. We used Iso-Seq and deep RNA-seq datasets to improve structural annotations, and updated gene symbols and textual annotation of functionally characterized genes via extensive manual curation. We discovered that the cell wall-less classical reference strain CC-503 exhibits genomic instability potentially caused by deletion of the helicase RECQ3, with major structural mutations identified that affect >100 genes. We therefore present the CC-4532 assembly as the primary reference, although this strain also carries unique structural mutations and is experiencing rapid proliferation of a Gypsy retrotransposon. We expect all laboratory strains to harbor gene-disrupting mutations, which should be considered when interpreting and comparing experimental results. Collectively, the resources presented here herald a new era of Chlamydomonas genomics and will provide the foundation for continued research in this important reference organism.
American chestnut was once a foundation species of eastern North American forests, but was rendered functionally extinct in the early 20th century by an exotic fungal blight (Cryphonectria ...parasitica). Over the past 30 years, the American Chestnut Foundation (TACF) has pursued backcross breeding to generate hybrids that combine the timber‐type form of American chestnut with the blight resistance of Chinese chestnut based on a hypothesis of major gene resistance. To accelerate selection within two backcross populations that descended from two Chinese chestnuts, we developed genomic prediction models for five presence/absence blight phenotypes of 1,230 BC3F2 selection candidates and average canker severity of their BC3F3 progeny. We also genotyped pure Chinese and American chestnut reference panels to estimate the proportion of BC3F2 genomes inherited from parent species. We found that genomic prediction from a method that assumes an infinitesimal model of inheritance (HBLUP) has similar accuracy to a method that tends to perform well for traits controlled by major genes (Bayes C). Furthermore, the proportion of BC3F2 trees' genomes inherited from American chestnut was negatively correlated with the blight resistance of these trees and their progeny. On average, selected BC3F2 trees inherited 83% of their genome from American chestnut and have blight resistance that is intermediate between F1 hybrids and American chestnut. Results suggest polygenic inheritance of blight resistance. The blight resistance of restoration populations will be enhanced through recurrent selection, by advancing additional sources of resistance through fewer backcross generations, and by potentially by breeding with transgenic blight‐tolerant trees.
Sex chromosomes have arisen independently in a wide variety of species, yet they share common characteristics, including the presence of suppressed recombination surrounding sex determination loci. ...Mammalian sex chromosomes contain multiple palindromic repeats across the non-recombining region that show sequence conservation through gene conversion and contain genes that are crucial for sexual reproduction. In plants, it is not clear if palindromic repeats play a role in maintaining sequence conservation in the absence of homologous recombination.
Here we present the first evidence of large palindromic structures in a plant sex chromosome, based on a highly contiguous assembly of the W chromosome of the dioecious shrub Salix purpurea. The W chromosome has an expanded number of genes due to transpositions from autosomes. It also contains two consecutive palindromes that span a region of 200 kb, with conspicuous 20-kb stretches of highly conserved sequences among the four arms that show evidence of gene conversion. Four genes in the palindrome are homologous to genes in the sex determination regions of the closely related genus Populus, which is located on a different chromosome. These genes show distinct, floral-biased expression patterns compared to paralogous copies on autosomes.
The presence of palindromes in sex chromosomes of mammals and plants highlights the intrinsic importance of these features in adaptive evolution in the absence of recombination. Convergent evolution is driving both the independent establishment of sex chromosomes as well as their fine-scale sequence structure.
Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop ...species (known as the 'pan-genome'
). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions
. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley-comprising landraces, cultivars and a wild barley-that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding.
Summary
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small ...genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high‐quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10‐fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large‐scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA‐seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement.
Significance Statement
An improved reference genome assembly, genome annotation, and transcriptome atlas provide fundamental resources for basic and applied research in the agriculturally important plant Sorghum bicolor. These resources enabled the identification of subtelomeric tandem repeats specific to sorghum, revealed patterns of genetic variation accumulation in the genome, and identified a set of kinases putatively involved in regulating tissue identity.
The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well ...as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches.
Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%.
The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.
The shift from outcrossing to selfing is common in flowering plants, but the genomic consequences and the speed at which they emerge remain poorly understood. An excellent model for understanding the ...evolution of self fertilization is provided by Capsella rubella, which became self compatible <200,000 years ago. We report a C. rubella reference genome sequence and compare RNA expression and polymorphism patterns between C. rubella and its outcrossing progenitor Capsella grandiflora. We found a clear shift in the expression of genes associated with flowering phenotypes, similar to that seen in Arabidopsis, in which self fertilization evolved about 1 million years ago. Comparisons of the two Capsella species showed evidence of rapid genome-wide relaxation of purifying selection in C. rubella without a concomitant change in transposable element abundance. Overall we document that the transition to selfing may be typified by parallel shifts in gene expression, along with a measurable reduction of purifying selection.
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping ...approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Sugarcane (Saccharum spp.) is a major crop for sugar and bioenergy production. Its highly polyploid, aneuploid, heterozygous, and interspecific genome poses major challenges for producing a reference ...sequence. We exploited colinearity with sorghum to produce a BAC-based monoploid genome sequence of sugarcane. A minimum tiling path of 4660 sugarcane BAC that best covers the gene-rich part of the sorghum genome was selected based on whole-genome profiling, sequenced, and assembled in a 382-Mb single tiling path of a high-quality sequence. A total of 25,316 protein-coding gene models are predicted, 17% of which display no colinearity with their sorghum orthologs. We show that the two species, S. officinarum and S. spontaneum, involved in modern cultivars differ by their transposable elements and by a few large chromosomal rearrangements, explaining their distinct genome size and distinct basic chromosome numbers while also suggesting that polyploidization arose in both lineages after their divergence.
Seagrasses colonized the sea on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems on the planet. Here we report the genome of ...Zostera marina (L.), the first, to our knowledge, marine angiosperm to be fully sequenced. This reveals unique insights into the genomic losses and gains involved in achieving the structural and physiological adaptations required for its marine lifestyle, arguably the most severe habitat shift ever accomplished by flowering plants. Key angiosperm innovations that were lost include the entire repertoire of stomatal genes, genes involved in the synthesis of terpenoids and ethylene signalling, and genes for ultraviolet protection and phytochromes for far-red sensing. Seagrasses have also regained functions enabling them to adjust to full salinity. Their cell walls contain all of the polysaccharides typical of land plants, but also contain polyanionic, low-methylated pectins and sulfated galactans, a feature shared with the cell walls of all macroalgae and that is important for ion homoeostasis, nutrient uptake and O2/CO2 exchange through leaf epidermal cells. The Z. marina genome resource will markedly advance a wide range of functional ecological studies from adaptation of marine ecosystems under climate warming, to unravelling the mechanisms of osmoregulation under high salinities that may further inform our understanding of the evolution of salt tolerance in crop plants.