Abstract
Transposable elements (TEs) are mobile repetitive DNA sequences shown to be major drivers of genome evolution. As the first plant to have its genome sequenced and analyzed at the genomic ...scale,
Arabidopsis thaliana
has largely contributed to our TE knowledge.
The present report describes 20 years of accumulated TE knowledge gained through the study of the
Arabidopsis
genome and covers the known TE families, their relative abundance, and their genomic distribution. It presents our knowledge of the different TE family activities, mobility, population and long-term evolutionary dynamics. Finally, the role of TE as substrates for new genes and their impact on gene expression is illustrated through a few selected demonstrative cases. Promising future directions for TE studies in this species conclude the review.
Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and ...evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.
High quality reference genomes are crucial to understanding genome function, structure and evolution. The availability of reference genomes has allowed us to start inferring the role of genetic ...variation in biology, disease, and biodiversity conservation. However, analyses across organisms demonstrate that a single reference genome is not enough to capture the global genetic diversity present in populations. In this work, we generate 32 high-quality reference genomes for the well-known model species D. melanogaster and focus on the identification and analysis of transposable element variation as they are the most common type of structural variant. We show that integrating the genetic variation across natural populations from five climatic regions increases the number of detected insertions by 58%. Moreover, 26% to 57% of the insertions identified using long-reads were missed by short-reads methods. We also identify hundreds of transposable elements associated with gene expression variation and new TE variants likely to contribute to adaptive evolution in this species. Our results highlight the importance of incorporating the genetic variation present in natural populations to genomic studies, which is essential if we are to understand how genomes function and evolve.
Little is known about the evolution of repeated sequences over long periods of time. Using two independent approaches, we show that the majority of the repeats found in the Arabidopsis thaliana ...genome are ancient and likely to derive from the retention of fragments deposited during ancestral bursts that occurred early in the Brassicaceae evolution. We determine that the majority of young repeats are found in pericentromeric domains, while older copies are frequent in the gene-rich regions. Our results further suggest that the DNA methylation of repeats through small RNA-mediated pathways can last over prolonged periods of time. We also illustrate the way repeated sequences are composted by mutations towards genomic dark matter over time, probably driven by the deamination of methylcytosines, which also have an impact on epigenomic landscapes. Overall, we show that the ancient proliferation of repeat families has long-term consequences on A. thaliana biology and genome composition.
Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to ...as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant.
The classification of transposable elements (TEs) is key step towards deciphering their potential impact on the genome. However, this process is often based on manual sequence inspection by TE ...experts. With the wealth of genomic sequences now available, this task requires automation, making it accessible to most scientists. We propose a new tool, PASTEC, which classifies TEs by searching for structural features and similarities. This tool outperforms currently available software for TE classification. The main innovation of PASTEC is the search for HMM profiles, which is useful for inferring the classification of unknown TE on the basis of conserved functional domains of the proteins. In addition, PASTEC is the only tool providing an exhaustive spectrum of possible classifications to the order level of the Wicker hierarchical TE classification system. It can also automatically classify other repeated elements, such as SSR (Simple Sequence Repeats), rDNA or potential repeated host genes. Finally, the output of this new tool is designed to facilitate manual curation by providing to biologists with all the evidence accumulated for each TE consensus.
PASTEC is available as a REPET module or standalone software (http://urgi.versailles.inra.fr/download/repet/REPET_linux-x64-2.2.tar.gz). It requires a Unix-like system. There are two standalone versions: one of which is parallelized (requiring Sun grid Engine or Torque), and the other of which is not.
The availability of the Whole-Genome Sequence of the wheat pest Mayetiola destructor offers the opportunity to investigate the Transposable Elements (TEs) content and their relationship with the ...genes involved in the insect virulence. In this study, de novo annotation carried out using REPET pipeline showed that TEs occupy approximately 16% of the genome and are represented by 1038 lineages. Class II elements were the most frequent and most TEs were inactive due to the deletions they have accumulated. The analyses of TEs ages revealed a first burst at 20% of divergence from present that mobilized many TE families including mostly Tc1/mariner and Gypsy superfamilies and a second burst at 2% of divergence, which involved mainly the class II elements suggesting new TEs invasions. Additionally, 86 TEs insertions involving recently transposed elements were identified. Among them, several MITEs and Gypsy retrotransposons were inserted in the vicinity of SSGP and chemosensory genes. The findings represent a valuable resource for more in-depth investigation of the TE impact onto M. destructor genome and their possible influence on the expression of the virulence and chemosensory genes and consequently the behavior of this pest towards its host plants.
The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International ...Wheat Genome Sequencing Consortium. Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. The portal provides enhanced search and browser features that will facilitate the deployment of the latest genomics resources in wheat improvement.
In bacteria, genes with related functions often are grouped together in operons and are cotranscribed as a single polycistronic mRNA. In eukaryotes, functionally related genes generally are scattered ...across the genome. Notable exceptions include gene clusters for catabolic pathways in yeast, synthesis of secondary metabolites in filamentous fungi, and the major histocompatibility complex in animals. Until quite recently it was thought that gene clusters in plants were restricted to tandem duplicates (for example, arrays of leucine-rich repeat disease-resistance genes). However, operon-like clusters of coregulated nonhomologous genes are an emerging theme in plant biology, where they may be involved in the synthesis of certain defense compounds. These clusters are unlikely to have arisen by horizontal gene transfer, and the mechanisms behind their formation are poorly understood. Previously in thale cress (Arabidopsis thaliana) we identified an operon-like gene cluster that is required for the synthesis and modification of the triterpene thalianol. Here we characterize a second operon-like triterpene cluster (the marneral cluster) from A. thaliana, compare the features of these two clusters, and investigate the evolutionary events that have led to cluster formation. We conclude that common mechanisms are likely to underlie the assembly and control of operon-like gene clusters in plants.
The origin of bread wheat (Triticum aestivum; AABBDD) has been a subject of controversy and of intense debate in the scientific community over the last few decades. In 2015, three articles published ...in New Phytologist discussed the origin of hexaploid bread wheat (AABBDD) from the diploid progenitors Triticum urartu (AA), a relative of Aegilops speltoides (BB) and Triticum tauschii (DD).
Access to new genomic resources since 2013 has offered the opportunity to gain novel insights into the paleohistory of modern bread wheat, allowing characterization of its origin from its diploid progenitors at unprecedented resolution.
We propose a reconciled evolutionary scenario for the modern bread wheat genome based on the complementary investigation of transposable element and mutation dynamics between diploid, tetraploid and hexaploid wheat.
In this scenario, the structural asymmetry observed between the A, B and D subgenomes in hexaploid bread wheat derives from the cumulative effect of diploid progenitor divergence, the hybrid origin of the D subgenome, and subgenome partitioning following the polyploidization events.