Heterogeneity among life traits in mammals has resulted in considerable phylogenetic conflict, particularly concerning the position of the placental root. Layered upon this are gene- and ...lineage-specific variation in amino acid substitution rates and compositional biases. Life trait variations that may impact upon mutational rates are longevity, metabolic rate, body size, and germ line generation time. Over the past 12 years, three main conflicting hypotheses have emerged for the placement of the placental root. These hypotheses place the Atlantogenata (common ancestor of Xenarthra plus Afrotheria), the Afrotheria, or the Xenarthra as the sister group to all other placental mammals. Model adequacy is critical for accurate tree reconstruction and by failing to account for these compositional and character exchange heterogeneities across the tree and data set, previous studies have not provided a strongly supported hypothesis for the placental root. For the first time, models that accommodate both tree and data set heterogeneity have been applied to mammal data. Here, we show the impact of accurate model assignment and the importance of data sets in accommodating model parameters while maintaining the power to reject competing hypotheses. Through these sophisticated methods, we demonstrate the importance of model adequacy, data set power and provide strong support for the Atlantogenata over other competing hypotheses for the position of the placental root.
Eukaryotes evolved from the symbiotic association of at least two prokaryotic partners, and a good deal is known about the timings, mechanisms, and dynamics of these evolutionary steps. Recently, it ...was shown that a new class of nuclear genes, symbiogenetic genes (S-genes), was formed concomitant with endosymbiosis and the subsequent evolution of eukaryotic photosynthetic lineages. Understanding their origins and contributions to eukaryogenesis would provide insights into the ways in which cellular complexity has evolved.
Here, we show that chimeric nuclear genes (S-genes), built from prokaryotic domains, are critical for explaining the leap forward in cellular complexity achieved during eukaryogenesis. A total of 282 S-gene families contributed solutions to many of the challenges faced by early eukaryotes, including enhancing the informational machinery, processing spliceosomal introns, tackling genotoxicity within the cell, and ensuring functional protein interactions in a larger, more compartmentalized cell. For hundreds of S-genes, we confirmed the origins of their components (bacterial, archaeal, or generally prokaryotic) by maximum likelihood phylogenies. Remarkably, Bacteria contributed nine-fold more S-genes than Archaea, including a two-fold greater contribution to informational functions. Therefore, there is an additional, large bacterial contribution to the evolution of eukaryotes, implying that fundamental eukaryotic properties do not strictly follow the traditional informational/operational divide for archaeal/bacterial contributions to eukaryogenesis.
This study demonstrates the extent and process through which prokaryotic fragments from bacterial and archaeal genes inherited during eukaryogenesis underly the creation of novel chimeric genes with important functions.
Increasingly, similarity networks are being used for evolutionary analyses of molecular datasets. These networks are very useful, in particular for the analysis of gene sharing, lateral gene transfer ...and for the detection of distant homologs. Currently, such analyses require some computer programming skills due to the limited availability of user-friendly freely distributed software. Consequently, although appealing, the construction and analyses of these networks remain less familiar to biologists than do phylogenetic approaches.
In order to ease the use of similarity networks in the community of evolutionary biologists, we introduce a software program, EGN, that runs under Linux or MacOSX. EGN automates the reconstruction of gene and genome networks from nucleic and proteic sequences. EGN also implements statistics describing genetic diversity in these samples, for various user-defined thresholds of similarities. In the interest of studying the complexity of evolutionary processes affecting microbial evolution, we applied EGN to a dataset of 571,044 proteic sequences from the three domains of life and from mobile elements. We observed that, in Borrelia, plasmids play a different role than in most other eubacteria. Rather than being genetic couriers involved in lateral gene transfer, Borrelia's plasmids and their genes act as private genetic goods, that contribute to the creation of genetic diversity within their parasitic hosts.
EGN can be used for constructing, analyzing, and mining molecular datasets in evolutionary studies. The program can help increase our knowledge of the processes through which genes from distinct sources and/or from multiple genomes co-evolve in lineages of cellular organisms.
New approaches for unravelling reassortment pathways Svinti, Victoria; Cotton, James A; McInerney, James O
BMC evolutionary biology,
2013-Jan-01, 2013-01-01, 20130101, 2013-Jan-1, Letnik:
13, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Every year the human population encounters epidemic outbreaks of influenza, and history reveals recurring pandemics that have had devastating consequences. The current work focuses on the development ...of a robust algorithm for detecting influenza strains that have a composite genomic architecture. These influenza subtypes can be generated through a reassortment process, whereby a virus can inherit gene segments from two different types of influenza particles during replication. Reassortant strains are often not immediately recognised by the adaptive immune system of the hosts and hence may be the source of pandemic outbreaks. Owing to their importance in public health and their infectious ability, it is essential to identify reassortant influenza strains in order to understand the evolution of this virus and describe reassortment pathways that may be biased towards particular viral segments. Phylogenetic methods have been used traditionally to identify reassortant viruses. In many studies up to now, the assumption has been that if two phylogenetic trees differ, it is because reassortment has caused them to be different. While phylogenetic incongruence may be caused by real differences in evolutionary history, it can also be the result of phylogenetic error. Therefore, we wish to develop a method for distinguishing between topological inconsistency that is due to confounding effects and topological inconsistency that is due to reassortment.
The current work describes the implementation of two approaches for robustly identifying reassortment events. The algorithms rest on the idea of significance of difference between phylogenetic trees or phylogenetic tree sets, and subtree pruning and regrafting operations, which mimic the effect of reassortment on tree topologies. The first method is based on a maximum likelihood (ML) framework (MLreassort) and the second implements a Bayesian approach (Breassort) for reassortment detection. We focus on reassortment events that are found by both methods. We test both methods on a simulated dataset and on a small collection of real viral data isolated in Hong Kong in 1999.
The nature of segmented viral genomes present many challenges with respect to disease. The algorithms developed here can effectively identify reassortment events in small viral datasets and can be applied not only to influenza but also to other segmented viruses. Owing to computational demands of comparing tree topologies, further development in this area is necessary to allow their application to larger datasets.
The formation of new genes by combining parts of existing genes is an important evolutionary process. Remodelled genes, which we call composites, have been investigated in many species, however, ...their distribution across all of life is still unknown. We set out to examine the extent to which genomes from cells and mobile genetic elements contain composite genes. We identify composite genes as those that show partial homology to at least two unrelated component genes. In order to identify composite and component genes, we constructed sequence similarity networks (SSNs) of more than one million genes from all three domains of life, as well as viruses and plasmids. We identified non-transitive triplets of nodes in this network and explored the homology relationships in these triplets to see if the middle nodes were indeed composite genes. In total, we identified 221,043 (18.57%) composites genes, which were distributed across all genomic and functional categories. In particular, the presence of composite genes is statistically more likely in eukaryotes than prokaryotes.
The role of public goods in planetary evolution McInerney, James O.; Erwin, Douglas H.
Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences,
12/2017, Letnik:
375, Številka:
2109
Journal Article
Recenzirano
Odprti dostop
Biological public goods are broadly shared within an ecosystem and readily available. They appear to be widespread and may have played important roles in the history of life on Earth. Of particular ...importance to events in the early history of life are the roles of public goods in the merging of genomes, protein domains and even cells. We suggest that public goods facilitated the origin of the eukaryotic cell, a classic major evolutionary transition. The recognition of genomic public goods challenges advocates of a direct graph view of phylogeny, and those who deny that any useful phylogenetic signal persists in modern genomes. Ecological spillovers generate public goods that provide new ecological opportunities.
This article is part of the themed issue ‘Reconceptualizing the origins of life’.
Eukaryotic genomes are mosaics of genes acquired from their prokaryotic ancestors, the eubacterial endosymbiont that gave rise to the mitochondrion and its archaebacterial host. Genomic footprints of ...the prokaryotic merger at the origin of eukaryotes are still discernable in eukaryotic genomes, where gene expression and function correlate with their prokaryotic ancestry. Molecular chaperones are essential in all domains of life as they assist the functional folding of their substrate proteins and protect the cell against the cytotoxic effects of protein misfolding. Eubacteria and archaebacteria code for slightly different chaperones, comprising distinct protein folding pathways. Here we study the evolution of the eukaryotic protein folding pathways following the endosymbiosis event. A phylogenetic analysis of all 64 chaperones encoded in the Saccharomyces cerevisiae genome revealed 25 chaperones of eubacterial ancestry, 11 of archaebacterial ancestry, 10 of ambiguous prokaryotic ancestry, and 18 that may represent eukaryotic innovations. Several chaperone families (e.g., Hsp90 and Prefoldin) trace their ancestry to only one prokaryote group, while others, such as Hsp40 and Hsp70, are of mixed ancestry, with members contributed from both prokaryotic ancestors. Analysis of the yeast chaperone-substrate interaction network revealed no preference for interaction between chaperones and substrates of the same origin. Our results suggest that the archaebacterial and eubacterial protein folding pathways have been reorganized and integrated into the present eukaryotic pathway. The highly integrated chaperone system of yeast is a manifestation of the central role of chaperone-mediated folding in maintaining cellular fitness. Most likely, both archaebacterial and eubacterial chaperone systems were essential at the very early stages of eukaryogenesis, and the retention of both may have offered new opportunities for expanding the scope of chaperone-mediated folding.
The
Escherichia coli
species contains a diverse set of sequence types and there remain important questions regarding differences in genetic content within this population that need to be addressed. ...Pangenomes are useful vehicles for studying gene content within sequence types. Here, we analyse 21
E. coli
sequence type pangenomes using comparative pangenomics to identify variance in both pangenome structure and content. We present functional breakdowns of sequence type core genomes and identify sequence types that are enriched in metabolism, transcription and cell membrane biogenesis genes. We also uncover metabolism genes that have variable core classification, depending on which allele is present. Our comparative pangenomics approach allows for detailed exploration of sequence type pangenomes within the context of the species. We show that ongoing gene gain and loss in the
E. coli
pangenome is sequence type-specific, which may be a consequence of distinct sequence type-specific evolutionary drivers.