The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each ...has been exposed to a unique and often complex set of TE families.
methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in
TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
In plants, RNA-directed DNA methylation (RdDM) employs small RNAs to target enzymes that methylate cytosine residues. Cytosine methylation and dimethylation of histone 3 lysine 9 (H3K9me2) are often ...linked. Together they condition an epigenetic defense that results in chromatin compaction and transcriptional silencing of transposons and viral chromatin. Canonical RdDM (Pol IV-RdDM), involving RNA polymerases IV and V (Pol IV and Pol V), was believed to be necessary to establish cytosine methylation, which in turn could recruit H3K9 methyltransferases. However, recent studies have revealed that a pathway involving Pol II and RNA-dependent RNA polymerase 6 (RDR6) (RDR6-RdDM) is likely responsible for establishing cytosine methylation at naive loci, while Pol IV-RdDM acts to reinforce and maintain it. We used the geminivirus Beet curly top virus (BCTV) as a model to examine the roles of Pol IV and Pol V in establishing repressive viral chromatin methylation. As geminivirus chromatin is formed de novo in infected cells, these viruses are unique models for processes involved in the establishment of epigenetic marks. We confirm that Pol IV and Pol V are not needed to establish viral DNA methylation but are essential for its amplification. Remarkably, however, both Pol IV and Pol V are required for deposition of H3K9me2 on viral chromatin. Our findings suggest that cytosine methylation alone is not sufficient to trigger de novo deposition of H3K9me2 and further that Pol IV-RdDM is responsible for recruiting H3K9 methyltransferases to viral chromatin.
In plants, RNA-directed DNA methylation (RdDM) uses small RNAs to target cytosine methylation, which is often linked to H3K9me2. These epigenetic marks silence transposable elements and DNA virus genomes, but how they are established is not well understood. Canonical RdDM, involving Pol IV and Pol V, was thought to establish cytosine methylation that in turn could recruit H3K9 methyltransferases, but recent studies compel a reevaluation of this view. We used BCTV to investigate the roles of Pol IV and Pol V in chromatin methylation. We found that both are needed to amplify, but not to establish, DNA methylation. However, both are required for deposition of H3K9me2. Our findings suggest that cytosine methylation is not sufficient to recruit H3K9 methyltransferases to naive viral chromatin and further that Pol IV-RdDM is responsible.
The history of
retroposons has been choreographed by the systematic accumulation of inherited diagnostic nucleotide substitutions to form discrete subfamilies, each having a distinct nucleotide ...consensus sequence. The oldest subfamily,
J, gave rise to
S after the split between Strepsirrhini and what would become Catarrhini and Platyrrhini. The
S lineage gave rise to
Y in catarrhines and to
Ta in platyrrhines. Platyrrhine
subfamilies Ta7, Ta10, and Ta15 were assigned names based on a standardized nomenclature. However, with the subsequent intensification of whole genome sequencing (WGS), large scale analyses to characterize
subfamilies using the program COSEG identified entire lineages of subfamilies simultaneously. The first platyrrhine genome with WGS, the common marmoset
; caljac3), resulted in
subfamily names sf0 to sf94 in an arbitrary order. Although easily resolved by alignment of the consensus sequences, this naming convention can become increasingly confusing as more genomes are independently analyzed. In this study, we reported
subfamily characterization for the platyrrhine three-family clade of Cebidae, Callithrichidae, and Aotidae. We investigated one species/genome from each recognized family of Callithrichidae and Aotidae and of both subfamilies (Cebinae andSaimiriinae) of the family Cebidae. Furthermore, we constructed a comprehensive network of
subfamily evolution within the three-family clade of platyrrhines to provide a working framework for future research.
expansion in the three-family clade has been dominated by
Ta15 and its derivatives.
Owl monkeys (genus Aotus), or “night monkeys” are platyrrhine primates in the Aotidae family. Early taxonomy only recognized one species, Aotus trivirgatus, until 1983, when Hershkovitz proposed nine ...unique species designations, classified into red-necked and gray-necked species groups based predominately on pelage coloration. Recent studies questioned this conventional separation of the genus and proposed designations based on the geographical location of wild populations. Alu retrotransposons are a class of mobile element insertion (MEI) widely used to study primate phylogenetics. A scaffold-level genome assembly for one Aotus species, Aotus nancymaae Anan_2.0, facilitated large-scale ascertainment of nearly 2000 young lineage-specific Alu insertions. This study provides candidate oligonucleotides for locus-specific PCR assays for over 1350 of these elements. For 314 Alu elements across four taxa with multiple specimens, PCR analyses identified 159 insertion polymorphisms, including 21 grouping A. nancymaae and Aotus azarae (red-necked species) as sister taxa, with Aotus vociferans and A. trivirgatus (gray-necked) being more basal. DNA sequencing identified five novel Alu elements from three different taxa. The Alu datasets reported in this study will assist in species identification and provide a valuable resource for Aotus phylogenetics, population genetics and conservation strategies when applied to wild populations.
Platy-1 retroposons are short interspersed elements (SINEs) unique to platyrrhine primates. Discovered in the common marmoset (
) genome in 2016, these 100 bp mobile element insertions (MEIs) ...appeared to be novel drivers of platyrrhine evolution, with over 2200 full-length members across 62 different subfamilies, and strong evidence of ongoing proliferation in
. Subsequent characterization of Platy-1 elements in
,
and
genera, suggested that the widespread mobilization detected in marmoset (family Callithrichidae) was perhaps an anomaly. Two additional Callithrichidae genomes are now available, a scaffold level genome assembly for
(tamarin; SagImp_v1) and a chromosome-level assembly for
(Midas tamarin; ASM2_v1). Here, we report that each tamarin genome contains over 11,000 full-length Platy-1 insertions, about 1150 are shared by both
tamarins, 7511 are unique to
, and another 8187 are unique to
. Roughly 325 are shared among the three callithrichids. We identified six new Platy-1 subfamilies derived from Platy-1-8, with the youngest new subfamily, Platy-1-8c_
, being the primary source of the
amplification burst. This constitutes the largest expansion of Platy-1 MEIs reported to date and the most extensive independent SINE amplification between two closely related species.
The role of structurally dynamic genomic regions in speciation is poorly understood due to challenges inherent in diploid genome assembly. Here we reconstructed the evolutionary dynamics of ...structural variation in five cat species by phasing the genomes of three interspecies F1 hybrids to generate near-gapless single-haplotype assemblies. We discerned that cat genomes have a paucity of segmental duplications relative to great apes, explaining their remarkable karyotypic stability. X chromosomes were hotspots of structural variation, including enrichment with inversions in a large recombination desert with characteristics of a supergene. The X-linked macrosatellite DXZ4 evolves more rapidly than 99.5% of the genome clarifying its role in felid hybrid incompatibility. Resolved sensory gene repertoires revealed functional copy number changes associated with ecomorphological adaptations, sociality and domestication. This study highlights the value of gapless genomes to reveal structural mechanisms underpinning karyotypic evolution, reproductive isolation and ecological niche adaptation.
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation
. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed ...without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes
and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.
Alu elements are powerful phylogenetic markers. The combination of a recently-developed computational pipeline, polyDetect, with high copy number Alu insertions has previously been utilized to help ...resolve the Papio baboon phylogeny with high statistical support. Here, the polyDetect method was applied to the highly contentious Cebidae phylogeny within New World monkeys (NWM). The polyDetect method relies on conserved homology/identity of short read sequence data among the species being compared to accurately map predicted shared Alu insertions to each unique flanking sequence. The results of this comprehensive assessment indicate that there were insufficient sequence homology/identity stretches in non-repeated DNA sequences among the four Cebidae genera analyzed in this study to make this strategy phylogenetically viable. The ~20 million years of evolutionary divergence of the Cebidae genera has resulted in random sequence decay within the short read data, obscuring potentially orthologous elements in the species tested. These analyses suggest that the polyDetect pipeline is best suited to resolving phylogenies of more recently diverged lineages when high-quality assembled genomes are not available for the taxa of interest.
Display omitted
•The Alu consensus sequence did not influence polyDetect output.•Different NWM reference genomes influenced phylogenetic tree topology.•Insufficient read homology could not precisely predict phylogeny of diverged NWM.
Baboons (genus
) are a morphologically and behaviorally diverse clade of catarrhine monkeys that have experienced hybridization between phenotypically and genetically distinct phylogenetic species. ...We used high-coverage whole-genome sequences from 225 wild baboons representing 19 geographic localities to investigate population genomics and interspecies gene flow. Our analyses provide an expanded picture of evolutionary reticulation among species and reveal patterns of population structure within and among species, including differential admixture among conspecific populations. We describe the first example of a baboon population with a genetic composition that is derived from three distinct lineages. The results reveal processes, both ancient and recent, that produced the observed mismatch between phylogenetic relationships based on matrilineal, patrilineal, and biparental inheritance. We also identified several candidate genes that may contribute to species-specific phenotypes.
Phylogenetic relationships among Cebidae species of platyrrhine primates are presently under debate. Studies prior to whole genome sequence (WGS) availability utilizing unidirectional
repeats linked
...and
as sister taxa, based on a limited number of genetic markers and specimens, while the relative positions of
,
and
remained controversial. Multiple WGS allowed computational detection of
-genome junctions, however random mutation and evolutionary decay of these short-read segments prevented phylogenetic resolution. In this study, WGS for four Cebidae genomes of marmoset, squirrel monkey, owl monkey and capuchin were analyzed for full-length
elements and each locus was compared to the other three genomes in all possible combinations using orthologous region sequence alignments. Over 2000 candidates were aligned and subjected to visual inspection. Approximately 34% passed inspection and were considered shared in their respective category, 48% failed due to the target being present in all four genomes, having N's in the sequence or other sequence quality anomalies, and 18% were determined to represent near parallel insertions (NP). Wet bench locus specific PCR confirmed the presence of shared
insertions in all phylogenetically informative categories, providing evidence of extensive incomplete lineage sorting (ILS) and an abundance of
proliferation during the complex radiation of Cebidae taxa.