The gene tree delusion Springer, Mark S.; Gatesy, John
Molecular phylogenetics and evolution,
January 2016, 2016-Jan, 2016-01-00, 20160101, Letnik:
94, Številka:
Pt A
Journal Article
Recenzirano
Display omitted
•Empirical data suggest coalescence-genes are tiny owing to the recombination ratchet.•Coalescence methods have not solved difficult problems in mammalian phylogeny.•Recent simulation ...studies that favor coalescence over concatenation are flawed.
Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.’s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12bp or less for Song et al.’s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.’s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews)+Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla+Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.’s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispecies coalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for ⩽15% of the conflicts among Song et al.’s (2012) 447 gene trees. Unfortunately, Song et al.’s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles’ heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.
Display omitted
•Published coalescence analyses of mammalian phylogeny strongly conflict with each other.•Concatenation has outperformed shortcut coalescence methods in simulations at deep ...timescales.•Hidden support in shortcut coalescence analyses of mammals is likely artifactual.•Coalescence gene size ratchets downward as taxon sampling is increased.•Inaccurate gene trees may be more problematic for shortcut coalescence than incomplete lineage sorting is for concatenation.
Large datasets are required to solve difficult phylogenetic problems that are deep in the Tree of Life. Currently, two divergent systematic methods are commonly applied to such datasets: the traditional supermatrix approach (= concatenation) and “shortcut” coalescence (= coalescence methods wherein gene trees and the species tree are not co-estimated). When applied to ancient clades, these contrasting frameworks often produce congruent results, but in recent phylogenetic analyses of Placentalia (placental mammals), this is not the case. A recent series of papers has alternatively disputed and defended the utility of shortcut coalescence methods at deep phylogenetic scales. Here, we examine this exchange in the context of published phylogenomic data from Mammalia; in particular we explore two critical issues – the delimitation of data partitions (“genes”) in coalescence analysis and hidden support that emerges with the combination of such partitions in phylogenetic studies. Hidden support – increased support for a clade in combined analysis of all data partitions relative to the support evident in separate analyses of the various data partitions, is a hallmark of the supermatrix approach and a primary rationale for concatenating all characters into a single matrix. In the most extreme cases of hidden support, relationships that are contradicted by all gene trees are supported when all of the genes are analyzed together. A valid fear is that shortcut coalescence methods might bypass or distort character support that is hidden in individual loci because small gene fragments are analyzed in isolation. Given the extensive systematic database for Mammalia, the assumptions and applicability of shortcut coalescence methods can be assessed with rigor to complement a small but growing body of simulation work that has directly compared these methods to concatenation. We document several remarkable cases of hidden support in both supermatrix and coalescence paradigms and argue that in most instances, the emergent support in the shortcut coalescence analyses is an artifact. By referencing rigorous molecular clock studies of Mammalia, we suggest that inaccurate gene trees that imply unrealistically deep coalescences debilitate shortcut coalescence analyses of the placental dataset. We document contradictory coalescence results for Placentalia, and outline a critical conundrum that challenges the general utility of shortcut coalescence methods at deep phylogenetic scales. In particular, the basic unit of analysis in coalescence analysis, the coalescence-gene, is expected to shrink in size as more taxa are analyzed, but as the amount of data for reconstruction of a gene tree ratchets downward, the number of nodes in the gene tree that need to be resolved ratchets upward. Some advocates of shortcut coalescence methods have attempted to address problems with inaccurate gene trees by concatenating multiple coalescence-genes to yield "gene trees" that better match the species tree. However, this hybrid concatenation/coalescence approach, “concatalescence,” contradicts the most basic biological rationale for performing a coalescence analysis in the first place. We discuss this reality in the context of recent simulation work that also suggests inaccurate reconstruction of gene trees is more problematic for shortcut coalescence methods than deep coalescence of independently segregating loci is for concatenation methods.
Afrotheria Springer, Mark S.
CB/Current biology,
03/2022, Letnik:
32, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Elephants and sea cows and tenrecs; hyraxes and aardvarks and sengis and golden moles. What do these very divergent and different looking mammals have in common? They are each other’s closest living ...relatives, and all belong to the placental mammal clade Afrotheria (‘African beasts’), which is one of the four major clades of placental mammals along with Xenarthra (anteaters, sloths, armadillos), Euarchontoglires (e.g. rodents, rabbits, primates), and Laurasiatheria (e.g. bats, carnivorans, odd-toed and even-toed ungulates) (Figure 1). Unlike many animal groups that were recognized and named long ago based on anatomical features, the Afrotheria emerged as a natural clade only in the 1990s when molecular techniques were applied to the problem of placental mammal classification. The recognition of Afrotheria represents a triumph of molecular phylogenetics and brings together a fantastically diverse assemblage of placental mammals with widely disparate ecological and morphological adaptations. Although Afrotheria was not previously proposed based on studies of anatomical characters, additional support for the monophyly of this clade comes from geography and the fossil record. Specifically, the six extant orders in Afrotheria share with each other early fossil representatives that are known from Africa or along the margins of the ancient Tethys Sea, hence Afrotheria.
Mark Springer introduces the Afrotheria, a large clade of placental mammals that was only recognized in light of molecular phylogenies.
Phylogenetic relationships, divergence times, and patterns of biogeographic descent among primate species are both complex and contentious. Here, we generate a robust molecular phylogeny for 70 ...primate genera and 367 primate species based on a concatenation of 69 nuclear gene segments and ten mitochondrial gene sequences, most of which were extracted from GenBank. Relaxed clock analyses of divergence times with 14 fossil-calibrated nodes suggest that living Primates last shared a common ancestor 71-63 Ma, and that divergences within both Strepsirrhini and Haplorhini are entirely post-Cretaceous. These results are consistent with the hypothesis that the Cretaceous-Paleogene mass extinction of non-avian dinosaurs played an important role in the diversification of placental mammals. Previous queries into primate historical biogeography have suggested Africa, Asia, Europe, or North America as the ancestral area of crown primates, but were based on methods that were coopted from phylogeny reconstruction. By contrast, we analyzed our molecular phylogeny with two methods that were developed explicitly for ancestral area reconstruction, and find support for the hypothesis that the most recent common ancestor of living Primates resided in Asia. Analyses of primate macroevolutionary dynamics provide support for a diversification rate increase in the late Miocene, possibly in response to elevated global mean temperatures, and are consistent with the fossil record. By contrast, diversification analyses failed to detect evidence for rate-shift changes near the Eocene-Oligocene boundary even though the fossil record provides clear evidence for a major turnover event ("Grande Coupure") at this time. Our results highlight the power and limitations of inferring diversification dynamics from molecular phylogenies, as well as the sensitivity of diversification analyses to different species concepts.
Mammal madness: is the mammal tree of life not yet resolved? Foley, Nicole M.; Springer, Mark S.; Teeling, Emma C.
Philosophical transactions of the Royal Society of London. Series B. Biological sciences,
07/2016, Letnik:
371, Številka:
1699
Journal Article
Recenzirano
Odprti dostop
Most molecular phylogenetic studies place all placental mammals into four superordinal groups, Laurasiatheria (e.g. dogs, bats, whales), Euarchontoglires (e.g. humans, rodents, colugos), Xenarthra ...(e.g. armadillos, anteaters) and Afrotheria (e.g. elephants, sea cows, tenrecs), and estimate that these clades last shared a common ancestor 90–110 million years ago. This phylogeny has provided a framework for numerous functional and comparative studies. Despite the high level of congruence among most molecular studies, questions still remain regarding the position and divergence time of the root of placental mammals, and certain ‘hard nodes’ such as the Laurasiatheria polytomy and Paenungulata that seem impossible to resolve. Here, we explore recent consensus and conflict among mammalian phylogenetic studies and explore the reasons for the remaining conflicts. The question of whether the mammal tree of life is or can be ever resolved is also addressed.
This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.
Cetaceans have a long history of commitment to a fully aquatic lifestyle that extends back to the Eocene. Extant species have evolved a spectacular array of adaptations in conjunction with their ...deployment into a diverse array of aquatic habitats. Sensory systems are among those that have experienced radical transformations in the evolutionary history of this clade. In the case of vision, previous studies have demonstrated important changes in the genes encoding rod opsin (RH1), short-wavelength sensitive opsin 1 (SWS1), and long-wavelength sensitive opsin (LWS) in selected cetaceans, but have not examined the full complement of opsin genes across the complete range of cetacean families. Here, we report protein-coding sequences for RH1 and both color opsin genes (SWS1, LWS) from representatives of all extant cetacean families. We examine competing hypotheses pertaining to the timing of blue shifts in RH1 relative to SWS1 inactivation in the early history of Cetacea, and we test the hypothesis that some cetaceans are rod monochomats. Molecular evolutionary analyses contradict the "coastal" hypothesis, wherein SWS1 was pseudogenized in the common ancestor of Cetacea, and instead suggest that RH1 was blue-shifted in the common ancestor of Cetacea before SWS1 was independently knocked out in baleen whales (Mysticeti) and in toothed whales (Odontoceti). Further, molecular evidence implies that LWS was inactivated convergently on at least five occasions in Cetacea: (1) Balaenidae (bowhead and right whales), (2) Balaenopteroidea (rorquals plus gray whale), (3) Mesoplodon bidens (Sowerby's beaked whale), (4) Physeter macrocephalus (giant sperm whale), and (5) Kogia breviceps (pygmy sperm whale). All of these cetaceans are known to dive to depths of at least 100 m where the underwater light field is dim and dominated by blue light. The knockout of both SWS1 and LWS in multiple cetacean lineages renders these taxa rod monochromats, a condition previously unknown among mammalian species.
Phylogenetic analyses on four new bat genomes provide convincing support for the placement of bats relative to other placental mammals, suggest that microbats are an unnatural group, and have ...important implications for understanding the evolution of echolocation.
Display omitted
•Summary coalescent methods are not robust to gene-tree misrooting errors.•Summary coalescent methods are not robust to homology errors.•Summary coalescent methods are not robust to ...differential sampling of taxa.•Additional conflicting retroelement insertions are revealed.•20,850 loci and 4345 retroelements do not robustly resolve palaeognath phylogeny.
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.