Recent years have seen progress in the development of statistically rigorous frameworks to infer outbreak transmission trees ("who infected whom") from epidemiological and genetic data. Making use of ...pathogen genome sequences in such analyses remains a challenge, however, with a variety of heuristic approaches having been explored to date. We introduce a statistical method exploiting both pathogen sequences and collection dates to unravel the dynamics of densely sampled outbreaks. Our approach identifies likely transmission events and infers dates of infections, unobserved cases and separate introductions of the disease. It also proves useful for inferring numbers of secondary infections and identifying heterogeneous infectivity and super-spreaders. After testing our approach using simulations, we illustrate the method with the analysis of the beginning of the 2003 Singaporean outbreak of Severe Acute Respiratory Syndrome (SARS), providing new insights into the early stage of this epidemic. Our approach is the first tool for disease outbreak reconstruction from genetic data widely available as free software, the R package outbreaker. It is applicable to various densely sampled epidemics, and improves previous approaches by detecting unobserved and imported cases, as well as allowing multiple introductions of the pathogen. Because of its generality, we believe this method will become a tool of choice for the analysis of densely sampled disease outbreaks, and will form a rigorous framework for subsequent methodological developments.
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be ...used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the ...recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.
Reconstructing individual transmission events in an infectious disease outbreak can provide valuable information and help inform infection control policy. Recent years have seen considerable progress ...in the development of methodologies for reconstructing transmission chains using both epidemiological and genetic data. However, only a few of these methods have been implemented in software packages, and with little consideration for customisability and interoperability. Users are therefore limited to a small number of alternatives, incompatible tools with fixed functionality, or forced to develop their own algorithms at considerable personal effort.
Here we present outbreaker2, a flexible framework for outbreak reconstruction. This R package re-implements and extends the original model introduced with outbreaker, but most importantly also provides a modular platform allowing users to specify custom models within an optimised inferential framework. As a proof of concept, we implement the within-host evolutionary model introduced with TransPhylo, which is very distinct from the original genetic model in outbreaker, and demonstrate how even complex model results can be successfully included with minimal effort.
outbreaker2 provides a valuable starting point for future outbreak reconstruction tools, and represents a unifying platform that promotes customisability and interoperability. Implemented in the R software, outbreaker2 joins a growing body of tools for outbreak analysis.
Genetic exchange plays a defining role in the evolution of many bacteria. The recent accumulation of nucleotide sequence data from multiple members of diverse bacterial genera has facilitated ...comparative studies that have revealed many features of this process. Here we focus on genetic exchange that has involved homologous recombination and illustrate how nucleotide sequence data have furthered our understanding of: (i) the frequency of recombination; (ii) the impact of recombination in different parts of the genome; and (iii) patterns of gene flow within bacterial populations. Summarizing the results obtained for a range of bacteria, we survey evidence indicating that the extent and nature of recombination vary widely among microbiological species and often among lineages assigned to the same microbiological species. These results have important implications in studies ranging from epidemiological investigations to examination of the bacterial species problem.
Clostridioides difficile
infection (CDI) remains an urgent global One Health threat. The genetic heterogeneity seen across
C. difficile
underscores its wide ecological versatility and has driven the ...significant changes in CDI epidemiology seen in the last 20 years. We analysed an international collection of over 12,000
C. difficile
genomes spanning the eight currently defined phylogenetic clades. Through whole-genome average nucleotide identity, and pangenomic and Bayesian analyses, we identified major taxonomic incoherence with clear species boundaries for each of the recently described cryptic clades CI–III. The emergence of these three novel genomospecies predates clades C1–5 by millions of years, rewriting the global population structure of
C. difficile
specifically and taxonomy of the
Peptostreptococcaceae
in general. These genomospecies all show unique and highly divergent toxin gene architecture, advancing our understanding of the evolution of
C. difficile
and close relatives. Beyond the taxonomic ramifications, this work may impact the diagnosis of CDI.
The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be ...thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications.
By decomposing genome sequences into k-mers, it is possible to estimate genome differences without alignment. Techniques such as k-mer minimisers, for example MinHash, have been developed and are ...often accurate approximations of distances based on full k-mer sets. These and other alignment-free methods avoid the large temporal and computational expense of alignment. However, these k-mer set comparisons are not entirely accurate within-species and can be completely inaccurate within-lineage. This is due, in part, to their inability to distinguish core polymorphism from accessory differences. Here we present a new approach, KmerAperture, which uses information on the k-mer relative genomic positions to determine the type of polymorphism causing differences in k-mer presence and absence between pairs of genomes. Single SNPs are expected to result in k unique contiguous k-mers per genome. On the other hand, contiguous series > k may be caused by accessory differences of length S-k+1; when the start and end of the sequence are contiguous with homologous sequence. Alternatively, they may be caused by multiple SNPs within k bp from each other and KmerAperture can determine whether that is the case. To demonstrate use cases KmerAperture was benchmarked using datasets including a very low diversity simulated population with accessory content independent from the number of SNPs, a simulated population where SNPs are spatially dense, a moderately diverse real cluster of genomes (Escherichia coli ST1193) with a large accessory genome and a low diversity real genome cluster (Salmonella Typhimurium ST34). We show that KmerAperture can accurately distinguish both core and accessory sequence diversity without alignment, outperforming other k-mer based tools.
Gonorrhoea is one of the most common bacterial sexually transmitted infections in England. Over 41,000 cases were recorded in 2015, more than half of which occurred in men who have sex with men ...(MSM). As the bacterium has developed resistance to each first-line antibiotic in turn, we need an improved understanding of fitness benefits and costs of antibiotic resistance to inform control policy and planning. Cefixime was recommended as a single-dose treatment for gonorrhoea from 2005 to 2010, during which time resistance increased, and subsequently declined.
We developed a stochastic compartmental model representing the natural history and transmission of cefixime-sensitive and cefixime-resistant strains of Neisseria gonorrhoeae in MSM in England, which was applied to data on diagnoses and prescriptions between 2008 and 2015. We estimated that asymptomatic carriers play a crucial role in overall transmission dynamics, with 37% (95% credible interval CrI 24%-52%) of infections remaining asymptomatic and untreated, accounting for 89% (95% CrI 82%-93%) of onward transmission. The fitness cost of cefixime resistance in the absence of cefixime usage was estimated to be such that the number of secondary infections caused by resistant strains is only about half as much as for the susceptible strains, which is insufficient to maintain persistence. However, we estimated that treatment of cefixime-resistant strains with cefixime was unsuccessful in 83% (95% CrI 53%-99%) of cases, representing a fitness benefit of resistance. This benefit was large enough to counterbalance the fitness cost when 31% (95% CrI 26%-36%) of cases were treated with cefixime, and when more than 55% (95% CrI 44%-66%) of cases were treated with cefixime, the resistant strain had a net fitness advantage over the susceptible strain. Limitations include sparse data leading to large intervals on key model parameters and necessary assumptions in the modelling of a complex epidemiological process.
Our study provides, to our knowledge, the first estimates of the fitness cost and benefit associated with resistance of the gonococcus to a clinically relevant antibiotic. Our findings have important implications for antibiotic stewardship and public health policies and, in particular, suggest that a previously abandoned antibiotic could be used again to treat a minority of gonorrhoea cases without raising resistance levels.
Escherichia coli is an important species of bacteria that can live as a harmless inhabitant of the guts of many animals, as a pathogen causing life-threatening conditions or freely in the non-host ...environment. This diversity of lifestyles has made it a particular focus of interest for studies of genetic variation, mainly with the aim to understand how a commensal can become a deadly pathogen. Many whole genomes of E. coli have been fully sequenced in the past few years, which offer helpful data to help understand how this important species evolved.
We compared 27 whole genomes encompassing four phylogroups of Escherichia coli (A, B1, B2 and E). From the core-genome we established the clonal relationships between the isolates as well as the role played by homologous recombination during their evolution from a common ancestor. We found strong evidence for sexual isolation between three lineages (A+B1, B2, E), which could be explained by the ecological structuring of E. coli and may represent on-going speciation. We identified three hotspots of homologous recombination, one of which had not been previously described and contains the aroC gene, involved in the essential shikimate metabolic pathway. We also described the role played by non-homologous recombination in the pan-genome, and showed that this process was highly heterogeneous. Our analyses revealed in particular that the genomes of three enterohaemorrhagic (EHEC) strains within phylogroup B1 have converged from originally separate backgrounds as a result of both homologous and non-homologous recombination.
Recombination is an important force shaping the genomic evolution and diversification of E. coli, both by replacing fragments of genes with an homologous sequence and also by introducing new genes. In this study, several non-random patterns of these events were identified which correlated with important changes in the lifestyle of the bacteria, and therefore provide additional evidence to explain the relationship between genomic variation and ecological adaptation.