The development of multiple chromosome-scale reference genome sequences in many taxonomic groups has yielded a high-resolution view of the patterns and processes of molecular evolution. Nonetheless, ...leveraging information across multiple genomes remains a significant challenge in nearly all eukaryotic systems. These challenges range from studying the evolution of chromosome structure, to finding candidate genes for quantitative trait loci, to testing hypotheses about speciation and adaptation. Here, we present GENESPACE, which addresses these challenges by integrating conserved gene order and orthology to define the expected physical position of all genes across multiple genomes. We demonstrate this utility by dissecting presence–absence, copy-number, and structural variation at three levels of biological organization: spanning 300 million years of vertebrate sex chromosome evolution, across the diversity of the Poaceae (grass) plant family, and among 26 maize cultivars. The methods to build and visualize syntenic orthology in the GENESPACE R package offer a significant addition to existing gene family and synteny programs, especially in polyploid, outbred, and other complex genomes.
The genome is the complete DNA sequence of an individual. It is a crucial foundation for many studies in medicine, agriculture, and conservation biology. Advances in genetics have made it possible to rapidly sequence, or read out, the genome of many organisms. For closely related species, scientists can then do detailed comparisons, revealing similar genes with a shared past or a common role, but comparing more distantly related organisms remains difficult.
One major challenge is that genes are often lost or duplicated over evolutionary time. One way to be more confident is to look at ‘synteny’, or how genes are organized or ordered within the genome. In some groups of species, synteny persists across millions of years of evolution. Combining sequence similarity with gene order could make comparisons between distantly related species more robust.
To do this, Lovell et al. developed GENESPACE, a software that links similarities between DNA sequences to the order of genes in a genome. This allows researchers to visualize and explore related DNA sequences and determine whether genes have been lost or duplicated. To demonstrate the value of GENESPACE, Lovell et al. explored evolution in vertebrates and flowering plants. The software was able to highlight the shared sequences between unique sex chromosomes in birds and mammals, and it was able to track the positions of genes important in the evolution of grass crops including maize, wheat, and rice.
Exploring the genetic code in this way could lead to a better understanding of the evolution of important sections of the genome. It might also allow scientists to find target genes for applications like crop improvement. Lovell et al. have designed the GENESPACE software to be easy for other scientists to use, allowing them to make graphics and perform analyses with few programming skills.
Wild and weedy relatives of domesticated crops harbor genetic variants that can advance agricultural biotechnology. Here we provide a genome resource for the wild plant green millet (Setaria ...viridis), a model species for studies of C
grasses, and use the resource to probe domestication genes in the close crop relative foxtail millet (Setaria italica). We produced a platinum-quality genome assembly of S. viridis and de novo assemblies for 598 wild accessions and exploited these assemblies to identify loci underlying three traits: response to climate, a 'loss of shattering' trait that permits mechanical harvest and leaf angle, a predictor of yield in many grass crops. With CRISPR-Cas9 genome editing, we validated Less Shattering1 (SvLes1) as a gene whose product controls seed shattering. In S. italica, this gene was rendered nonfunctional by a retrotransposon insertion in the domesticated loss-of-shattering allele SiLes1-TE (transposable element). This resource will enhance the utility of S. viridis for dissection of complex traits and biotechnological improvement of panicoid crops.
Summary
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small ...genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high‐quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10‐fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large‐scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA‐seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement.
Significance Statement
An improved reference genome assembly, genome annotation, and transcriptome atlas provide fundamental resources for basic and applied research in the agriculturally important plant Sorghum bicolor. These resources enabled the identification of subtelomeric tandem repeats specific to sorghum, revealed patterns of genetic variation accumulation in the genome, and identified a set of kinases putatively involved in regulating tissue identity.
Sugarcane (Saccharum spp.) is a major crop for sugar and bioenergy production. Its highly polyploid, aneuploid, heterozygous, and interspecific genome poses major challenges for producing a reference ...sequence. We exploited colinearity with sorghum to produce a BAC-based monoploid genome sequence of sugarcane. A minimum tiling path of 4660 sugarcane BAC that best covers the gene-rich part of the sorghum genome was selected based on whole-genome profiling, sequenced, and assembled in a 382-Mb single tiling path of a high-quality sequence. A total of 25,316 protein-coding gene models are predicted, 17% of which display no colinearity with their sorghum orthologs. We show that the two species, S. officinarum and S. spontaneum, involved in modern cultivars differ by their transposable elements and by a few large chromosomal rearrangements, explaining their distinct genome size and distinct basic chromosome numbers while also suggesting that polyploidization arose in both lineages after their divergence.
Tepary bean (Phaseolus acutifolis A. Gray), native to the Sonoran Desert, is highly adapted to heat and drought. It is a sister species of common bean (Phaseolus vulgaris L.), the most important ...legume protein source for direct human consumption, and whose production is threatened by climate change. Here, we report on the tepary genome including exploration of possible mechanisms for resilience to moderate heat stress and a reduced disease resistance gene repertoire, consistent with adaptation to arid and hot environments. Extensive collinearity and shared gene content among these Phaseolus species will facilitate engineering climate adaptation in common bean, a key food security crop, and accelerate tepary bean improvement.
Plants can transmit somatic mutations and epimutations to offspring, which in turn can affect fitness. Knowledge of the rate at which these variations arise is necessary to understand how plant ...development contributes to local adaption in an ecoevolutionary context, particularly in long-lived perennials.
Here, we generate a new high-quality reference genome from the oldest branch of a wild Populus trichocarpa tree with two dominant stems which have been evolving independently for 330 years. By sampling multiple, age-estimated branches of this tree, we use a multi-omics approach to quantify age-related somatic changes at the genetic, epigenetic, and transcriptional level. We show that the per-year somatic mutation and epimutation rates are lower than in annuals and that transcriptional variation is mainly independent of age divergence and cytosine methylation. Furthermore, a detailed analysis of the somatic epimutation spectrum indicates that transgenerationally heritable epimutations originate mainly from DNA methylation maintenance errors during mitotic rather than during meiotic cell divisions.
Taken together, our study provides unprecedented insights into the origin of nucleotide and functional variation in a long-lived perennial plant.
Genome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other ...outcrossing tree species, highly heterozygous genomes, significant presence-absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the 'Pawnee' cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence-absence and functional annotation database among genomes and within the two outbred haplotypes of the 'Lakota' genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.
Environmental stress is a major driver of ecological community dynamics and agricultural productivity. This is especially true for soil water availability, because drought is the greatest abiotic ...inhibitor of worldwide crop yields. Here, we test the genetic basis of drought responses in the genetic model for C
perennial grasses, Panicum hallii, through population genomics, field-scale gene-expression (eQTL) analysis, and comparison of two complete genomes. While gene expression networks are dominated by local cis-regulatory elements, we observe three genomic hotspots of unlinked trans-regulatory loci. These regulatory hubs are four times more drought responsive than the genome-wide average. Additionally, cis- and trans-regulatory networks are more likely to have opposing effects than expected under neutral evolution, supporting a strong influence of compensatory evolution and stabilizing selection. These results implicate trans-regulatory evolution as a driver of drought responses and demonstrate the potential for crop improvement in drought-prone regions through modification of gene regulatory networks.
Summary
High‐throughput RNA sequencing (RNA‐seq) has recently become the method of choice to define and analyze transcriptomes. For the model moss Physcomitrella patens, although this method has been ...used to help analyze specific perturbations, no overall reference dataset has yet been established. In the framework of the Gene Atlas project, the Joint Genome Institute selected P. patens as a flagship genome, opening the way to generate the first comprehensive transcriptome dataset for this moss. The first round of sequencing described here is composed of 99 independent libraries spanning 34 different developmental stages and conditions. Upon dataset quality control and processing through read mapping, 28 509 of the 34 361 v3.3 gene models (83%) were detected to be expressed across the samples. Differentially expressed genes (DEGs) were calculated across the dataset to permit perturbation comparisons between conditions. The analysis of the three most distinct and abundant P. patens growth stages – protonema, gametophore and sporophyte – allowed us to define both general transcriptional patterns and stage‐specific transcripts. As an example of variation of physico‐chemical growth conditions, we detail here the impact of ammonium supplementation under standard growth conditions on the protonemal transcriptome. Finally, the cooperative nature of this project allowed us to analyze inter‐laboratory variation, as 13 different laboratories around the world provided samples. We compare differences in the replication of experiments in a single laboratory and between different laboratories.
Significance statement
This work presents the first large scale RNA‐seq dataset for the moss model Physcomitrella patens. The sampling covers 34 experiments, ensuring a good coverage of the organisms’ life cycle under standard laboratory growing conditions as well as a panel of treatments. The detection of differentially expressed genes allows a clear contrast between these experiments and permits insight in the biological mechanisms underlining the observed differences.
Leaf fungal microbiomes can be fundamental drivers of host plant success, as they contain pathogens that devastate crop plants and taxa that enhance nutrient uptake, discourage herbivory, and ...antagonize pathogens. We measured leaf fungal diversity with amplicon sequencing across an entire growing season in a diversity panel of switchgrass (
Panicum virgatum
). We also sampled a replicated subset of genotypes across 3 additional sites to compare the importance of time, space, ecology, and genetics. We found a strong successional pattern in the microbiome shaped both by host genetics and environmental factors. Further, we used genome-wide association (GWA) mapping and RNA sequencing to show that 3 cysteine-rich receptor-like kinases (crRLKs) were linked to a genetic locus associated with microbiome structure. We confirmed GWAS results in an independent set of genotypes for both the internal transcribed spacer (ITS) and large subunit (LSU) ribosomal DNA markers. Fungal pathogens were central to microbial covariance networks, and genotypes susceptible to pathogens differed in their expression of the 3 crRLKs, suggesting that host immune genes are a principal means of controlling the entire leaf microbiome.