Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene ...duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Identifying homology relationships between sequences is fundamental to biological research. Here we provide a novel orthogroup inference algorithm called OrthoFinder that solves a previously ...undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy. Using real benchmark datasets we demonstrate that OrthoFinder is more accurate than other orthogroup inference methods by between 8 % and 33 %. Furthermore, we demonstrate the utility of OrthoFinder by providing a complete classification of transcription factor gene families in plants revealing 6.9 million previously unobserved relationships.
The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene ...duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes.
Abstract
Organelle biogenesis and function is dependent on the concerted action of both organellar-encoded (if present) and nuclear-encoded proteins. Differences between homologous organelles across ...the Plant Kingdom arise, in part, as a result of differences in the cohort of nuclear-encoded proteins that are targeted to them. However, neither the rate at which differences in protein targeting accumulate nor the evolutionary consequences of these changes are known. Using phylogenomic approaches coupled to ancestral state estimation, we show that the plant organellar proteome has diversified in proportion with molecular sequence evolution such that the proteomes of plant chloroplasts and mitochondria lose or gain on average 3.6 proteins per million years. We further demonstrate that changes in organellar protein targeting are associated with an increase in the rate of molecular sequence evolution and that such changes predominantly occur in genes with regulatory rather than metabolic functions. Finally, we show that gain and loss of protein target signals occurs at a higher rate following gene duplication, revealing that gene and genome duplication are a key facilitator of plant organelle evolution.
Rubisco is the primary entry point for carbon into the biosphere. However, rubisco is widely regarded as inefficient leading many to question whether the enzyme can adapt to become a better catalyst. ...Through a phylogenetic investigation of the molecular and kinetic evolution of Form I rubisco we uncover the evolutionary trajectory of rubisco kinetic evolution in angiosperms. We show that rbcL is among the 1% of slowest-evolving genes and enzymes on Earth, accumulating one nucleotide substitution every 0.9 My and one amino acid mutation every 7.2 My. Despite this, rubisco catalysis has been continually evolving toward improved CO2/O2 specificity, carboxylase turnover, and carboxylation efficiency. Consistent with this kinetic adaptation, increased rubisco evolution has led to a concomitant improvement in leaf-level CO2 assimilation. Thus, rubisco has been slowly but continually evolving toward improved catalytic efficiency and CO2 assimilation in plants.
Rubisco is the primary entry point for carbon into the biosphere. However, rubisco is widely regarded as inefficient leading many to question whether the enzyme can adapt to become a better catalyst. ...Through a phylogenetic investigation of the molecular and kinetic evolution of Form I rubisco we uncover the evolutionary trajectory of rubisco kinetic evolution in angiosperms. We show that
is among the 1% of slowest-evolving genes and enzymes on Earth, accumulating one nucleotide substitution every 0.9 My and one amino acid mutation every 7.2 My. Despite this, rubisco catalysis has been continually evolving toward improved CO
/O
specificity, carboxylase turnover, and carboxylation efficiency. Consistent with this kinetic adaptation, increased rubisco evolution has led to a concomitant improvement in leaf-level CO
assimilation. Thus, rubisco has been slowly but continually evolving toward improved catalytic efficiency and CO
assimilation in plants.
Determining the evolutionary relationships between genes is fundamental to comparative biological research. Here, we present SHOOT. SHOOT searches a user query sequence against a database of ...phylogenetic trees and returns a tree with the query sequence correctly placed within it. We show that SHOOT performs this analysis with comparable speed to a BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional tree inference, and it can identify orthologs with high accuracy. In summary, SHOOT is a fast and accurate tool for phylogenetic analyses of novel query sequences. It is available online at www.shoot.bio .
The colonization of the land by streptophytes and their subsequent radiation is a major event in Earth history. We report a stepwise increase in the number of transcription factor (TF) families and ...subfamilies in Archaeplastida before the colonization of the land. The subsequent increase in TF number on land was through duplication within existing TF families and subfamilies. Almost all subfamilies of the Homeodomain (HD) and basic Helix-Loop-Helix (bHLH) had evolved before the radiation of extant land plant lineages from a common ancestor. We demonstrate that the evolution of these TF families independently followed similar trends in both plants and metazoans; almost all extant HD and bHLH subfamilies were present in the first land plants and in the last common ancestor of bilaterians. These findings reveal that the majority of innovation in plant and metazoan TF families occurred in the Precambrian before the Phanerozoic radiation of land plants and metazoans.
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution ...of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes is enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Furthermore, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species.
C4 photosynthesis, gene duplication, gene families, parallel evolution.
Abstract
Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a ...range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.