The molecular clock presents a means of estimating evolutionary rates and timescales using genetic data. These estimates can lead to important insights into evolutionary processes and mechanisms, as ...well as providing a framework for further biological analyses. To deal with rate variation among genes and among lineages, a diverse range of molecular‐clock methods have been developed. These methods have been implemented in various software packages and differ in their statistical properties, ability to handle different models of rate variation, capacity to incorporate various forms of calibrating information and tractability for analysing large data sets. Choosing a suitable molecular‐clock model can be a challenging exercise, but a number of model‐selection techniques are available. In this review, we describe the different forms of evolutionary rate heterogeneity and explain how they can be accommodated in molecular‐clock analyses. We provide an outline of the various clock methods and models that are available, including the strict clock, local clocks, discrete clocks and relaxed clocks. Techniques for calibration and clock‐model selection are also described, along with methods for handling multilocus data sets. We conclude our review with some comments about the future of molecular clocks.
The cross-species transmission of viruses from one host species to another is responsible for the majority of emerging infections. However, it is unclear whether some virus families have a greater ...propensity to jump host species than others. If related viruses have an evolutionary history of co-divergence with their hosts there should be evidence of topological similarities between the virus and host phylogenetic trees, whereas host jumping generates incongruent tree topologies. By analyzing co-phylogenetic processes in 19 virus families and their eukaryotic hosts we provide a quantitative and comparative estimate of the relative frequency of virus-host co-divergence versus cross-species transmission among virus families. Notably, our analysis reveals that cross-species transmission is a near universal feature of the viruses analyzed here, with virus-host co-divergence occurring less frequently and always on a subset of viruses. Despite the overall high topological incongruence among virus and host phylogenies, the Hepadnaviridae, Polyomaviridae, Poxviridae, Papillomaviridae and Adenoviridae, all of which possess double-stranded DNA genomes, exhibited more frequent co-divergence than the other virus families studied here. At the other extreme, the virus and host trees for all the RNA viruses studied here, particularly the Rhabdoviridae and the Picornaviridae, displayed high levels of topological incongruence, indicative of frequent host switching. Overall, we show that cross-species transmission plays a major role in virus evolution, with all the virus families studied here having the potential to jump host species, and that increased sampling will likely reveal more instances of host jumping.
New Zealand, a geographically remote Pacific island with easily sealable borders, implemented a nationwide 'lockdown' of all non-essential services to curb the spread of COVID-19. Here, we generate ...649 SARS-CoV-2 genome sequences from infected patients in New Zealand with samples collected during the 'first wave', representing 56% of all confirmed cases in this time period. Despite its remoteness, the viruses imported into New Zealand represented nearly all of the genomic diversity sequenced from the global virus population. These data helped to quantify the effectiveness of public health interventions. For example, the effective reproductive number, R
of New Zealand's largest cluster decreased from 7 to 0.2 within the first week of lockdown. Similarly, only 19% of virus introductions into New Zealand resulted in ongoing transmission of more than one additional case. Overall, these results demonstrate the utility of genomic pathogen surveillance to inform public health and disease mitigation.
Evolutionary timescales can be estimated from genetic data using phylogenetic methods based on the molecular clock. To account for molecular rate variation among lineages, a number of relaxed‐clock ...models have been developed. Some of these models assume that rates vary among lineages in an autocorrelated manner, so that closely related species share similar rates. In contrast, uncorrelated relaxed clocks allow all of the branch‐specific rates to be drawn from a single distribution, without assuming any correlation between rates along neighbouring branches. There is uncertainty about which of these two classes of relaxed‐clock models are more appropriate for biological data. We present an R package, NELSI, that allows the evolution of DNA sequences to be simulated according to a range of clock models. Using data generated by this package, we assessed the ability of two Bayesian phylogenetic methods to distinguish among different relaxed‐clock models and to quantify rate variation among lineages. The results of our analyses show that rate autocorrelation is typically difficult to detect, even when there is complete taxon sampling. This provides a potential explanation for past failures to detect rate autocorrelation in a range of data sets.
Display omitted
•Molecular clocks can be calibrated with probability distributions of node ages.•We use simulations to investigate the effect of the age and number of calibrations.•Molecular clock ...model misspecification is an important source of estimation error.•The best strategy is to include multiple calibrations and to prefer those at deep nodes.•Effective calibrations minimise estimation error due to clock model misspecification.
Phylogenetic estimates of evolutionary timescales can be obtained from nucleotide sequence data using the molecular clock. These estimates are important for our understanding of evolutionary processes across all taxonomic levels. The molecular clock needs to be calibrated with an independent source of information, such as fossil evidence, to allow absolute ages to be inferred. Calibration typically involves fixing or constraining the age of at least one node in the phylogeny, enabling the ages of the remaining nodes to be estimated. We conducted an extensive simulation study to investigate the effects of the position and number of calibrations on the resulting estimate of the timescale. Our analyses focused on Bayesian estimates obtained using relaxed molecular clocks. Our findings suggest that an effective strategy is to include multiple calibrations and to prefer those that are close to the root of the phylogeny. Under these conditions, we found that evolutionary timescales could be estimated accurately even when the relaxed-clock model was misspecified and when the sequence data were relatively uninformative. We tested these findings in a case study of simian foamy virus, where we found that shallow calibrations caused the overall timescale to be underestimated by up to three orders of magnitude. Finally, we provide some recommendations for improving the practice of molecular-clock calibration.
Dating the emergence of human pathogens Ho, Simon Y W; Duchêne, Sebastián
Science (American Association for the Advancement of Science),
06/2020, Letnik:
368, Številka:
6497
Journal Article
Recenzirano
Ancient genomes can narrow the search for the sources of zoonotic transmissions
Understanding the emergence and evolution of human pathogens plays a pivotal role in epidemiology and in predicting the ...trajectories of outbreaks. The application of phylogenetic methods to pathogen genomes has provided a range of insights into their evolutionary dynamics (
1
). In many cases, phylogenetic methods can use the sampling dates of the genomes to reconstruct the evolutionary time scales of viruses, bacteria, and other pathogens. Ancient genomes can increase the power of these approaches by narrowing the estimated time window of pathogen emergence and by augmenting the evolutionary temporal signal in the genetic data. On page 1367 of this issue, Düx
et al.
(
2
) show how a century-old genome of
Measles morbillivirus
, extracted from human lung tissue, can help efforts to pinpoint the time of emergence of measles.
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is ...increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Relaxed molecular clocks allow the phylogenetic estimation of evolutionary timescales even when substitution rates vary among branches. In analyses of large multigene datasets, it is often ...appropriate to use multiple relaxed-clock models to accommodate differing patterns of rate variation among genes. We present ClockstaR, a method for selecting the number of relaxed clocks for multigene datasets.
ClockstaR is freely available for download at http://sydney.edu.au/science/biology/meep/software/.
Group A Streptococcus (GAS; Streptococcus pyogenes) is a bacterial pathogen for which a commercial vaccine for humans is not available. Employing the advantages of high-throughput DNA sequencing ...technology to vaccine design, we have analyzed 2,083 globally sampled GAS genomes. The global GAS population structure reveals extensive genomic heterogeneity driven by homologous recombination and overlaid with high levels of accessory gene plasticity. We identified the existence of more than 290 clinically associated genomic phylogroups across 22 countries, highlighting challenges in designing vaccines of global utility. To determine vaccine candidate coverage, we investigated all of the previously described GAS candidate antigens for gene carriage and gene sequence heterogeneity. Only 15 of 28 vaccine antigen candidates were found to have both low naturally occurring sequence variation and high (>99%) coverage across this diverse GAS population. This technological platform for vaccine coverage determination is equally applicable to prospective GAS vaccine antigens identified in future studies.