The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. ...However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug-resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements.
Some of the most common infectious diseases are caused by bacteria that naturally colonise humans asymptomatically. Combating these opportunistic pathogens requires an understanding of the traits ...that differentiate infecting strains from harmless relatives. Staphylococcus epidermidis is carried asymptomatically on the skin and mucous membranes of virtually all humans but is a major cause of nosocomial infection associated with invasive procedures. Here we address the underlying evolutionary mechanisms of opportunistic pathogenicity by combining pangenome-wide association studies and laboratory microbiology to compare S. epidermidis from bloodstream and wound infections and asymptomatic carriage. We identify 61 genes containing infection-associated genetic elements (k-mers) that correlate with in vitro variation in known pathogenicity traits (biofilm formation, cell toxicity, interleukin-8 production, methicillin resistance). Horizontal gene transfer spreads these elements, allowing divergent clones to cause infection. Finally, Random Forest model prediction of disease status (carriage vs. infection) identifies pathogenicity elements in 415 S. epidermidis isolates with 80% accuracy, demonstrating the potential for identifying risk genotypes pre-operatively.
is a prevalent zoonotic foodborne pathogen. Swine and pork are implicated as important sources of salmonellosis in humans. In Chiang Mai and Lamphun Provinces in northern Thailand, there has been a ...high prevalence of
persistence for over a decade. Infection is usually with dominant
serotypes, including serotypes Rissen and 1,4,5,12:i:-. However, other serotypes also contribute to disease but are less well characterized. The whole genome sequencing data of 43
serotypes isolated from pork production chain through 2011-2014, were used to evaluate genetic diversity and ascertain the possible source of
contamination based on Core Genome Multilocus Sequence Typing (cgMLST) approach. The
serotypes recovered from farms and slaughterhouses were re-circulating by swine environmental contamination. Conversely, the
contamination in the retail market represents cross-contamination from multiple sources, including contaminated foodstuffs.
contamination in the pork production chain has the competency for host cell adhesion, host cell invasion, and intracellular survival, which is enough for the pathogenicity of salmonellosis. In addition, all of these isolates were multi-drug resistant
, which contained at least 10 antimicrobial resistance genes. This result indicated that these
serotypes also pose a significant public health risk. Our findings support the need for appropriate surveillance of food-animal products going to market to reduce public exposure to highly pathogenic, multi-drug resistant
. Acquiring information would motivate all stakeholders to reinforce sanitation standards throughout the pork production chain in order to eradicate
contamination and reduce the risk of salmonellosis in humans.
Chickens are the most common birds on Earth and colibacillosis is among the most common diseases affecting them. This major threat to animal welfare and safe sustainable food production is difficult ...to combat because the etiological agent, avian pathogenic Escherichia coli (APEC), emerges from ubiquitous commensal gut bacteria, with no single virulence gene present in all disease-causing isolates. Here, we address the underlying evolutionary mechanisms of extraintestinal spread and systemic infection in poultry. Combining population scale comparative genomics and pangenome-wide association studies, we compare E. coli from commensal carriage and systemic infections. We identify phylogroup-specific and species-wide genetic elements that are enriched in APEC, including pathogenicity-associated variation in 143 genes that have diverse functions, including genes involved in metabolism, lipopolysaccharide synthesis, heat shock response, antimicrobial resistance and toxicity. We find that horizontal gene transfer spreads pathogenicity elements, allowing divergent clones to cause infection. Finally, a Random Forest model prediction of disease status (carriage vs. disease) identifies pathogenic strains in the emergent ST-117 poultry-associated lineage with 73% accuracy, demonstrating the potential for early identification of emergent APEC in healthy flocks.
Campylobacter jejuni and Campylobacter coli are the biggest causes of bacterial gastroenteritis in the developed world, with human infections typically arising from zoonotic transmission associated ...with infected meat. Because Campylobacter is not thought to survive well outside the gut, host-associated populations are genetically isolated to varying degrees. Therefore, the likely origin of most strains can be determined by host-associated variation in the genome. This is instructive for characterizing the source of human infection. However, some common strains, notably isolates belonging to the ST-21, ST-45 and ST-828 clonal complexes, appear to have broad host ranges, hindering source attribution. Here whole-genome sequencing has the potential to reveal fine-scale genetic structure associated with host specificity. We found that rates of zoonotic transmission among animal host species in these clonal complexes were so high that the signal of host association is all but obliterated, estimating one zoonotic transmission event every 1.6, 1.8 and 12 years in the ST-21, ST-45 and ST828 complexes, respectively. We attributed 89% of clinical cases to a chicken source, 10% to cattle and 1% to pig. Our results reveal that common strains of C. jejuni and C. coli infectious to humans are adapted to a generalist lifestyle, permitting rapid transmission between different hosts. Furthermore, they show that the weak signal of host association within these complexes presents a challenge for pinpointing the source of clinical infections, underlining the view that whole-genome sequencing, powerful though it is, cannot substitute for intensive sampling of suspected transmission reservoirs.
The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how ...they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation--focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥ 70% identity over ≥ 50% of the locus length--aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.
Helicobacter pylori are stomach-dwelling bacteria that are present in about 50% of the global population. Infection is asymptomatic in most cases, but it has been associated with gastritis, gastric ...ulcers and gastric cancer. Epidemiological evidence shows that progression to cancer depends upon the host and pathogen factors, but questions remain about why cancer phenotypes develop in a minority of infected people. Here, we use comparative genomics approaches to understand how genetic variation amongst bacterial strains influences disease progression.
We performed a genome-wide association study (GWAS) on 173 H. pylori isolates from the European population (hpEurope) with known disease aetiology, including 49 from individuals with gastric cancer. We identified SNPs and genes that differed in frequency between isolates from patients with gastric cancer and those with gastritis. The gastric cancer phenotype was associated with the presence of babA and genes in the cag pathogenicity island, one of the major virulence determinants of H. pylori, as well as non-synonymous variations in several less well-studied genes. We devised a simple risk score based on the risk level of associated elements present, which has the potential to identify strains that are likely to cause cancer but will require refinement and validation.
There are a number of challenges to applying GWAS to bacterial infections, including the difficulty of obtaining matched controls, multiple strain colonization and the possibility that causative strains may not be present when disease is detected. Our results demonstrate that bacterial factors have a sufficiently strong influence on disease progression that even a small-scale GWAS can identify them. Therefore, H. pylori GWAS can elucidate mechanistic pathways to disease and guide clinical treatment options, including for asymptomatic carriers.
Pseudomonas aeruginosa (PA) is an opportunistic pathogen that causes diverse human infections including chronic airway infection in patients with cystic fibrosis (CF). Comparing the genomes of CF and ...non-CF PA isolates has great potential to identify the genetic basis of pathogenicity. To gain a deeper understanding of PA adaptation in CF airways, we performed a genome-wide association study (GWAS) on 1,001 PA genomes. Genetic variations identified among CF isolates were categorized into (i) alterations in protein-coding regions, either large- or small-scale, and (ii) polymorphic variation in intergenic regions. We introduced each CF-associated genetic alteration into the genome of PAO1, a prototype PA strain, and validated the outcomes experimentally. Loci readily mutated among CF isolates included genes encoding a probable sulfatase, a probable TonB-dependent receptor (PA2332~PA2336), L-cystine transporter (YecS, PA0313), and a probable transcriptional regulator (PA5438). A promoter region of a heme/hemoglobin uptake outer membrane receptor (PhuR, PA4710) was also different between the CF and non-CF isolate groups. Our analysis highlights ways in which the PA genome evolves to survive and persist within the context of chronic CF infection.
Measuring molecular evolution in bacteria typically requires estimation of the rate at which nucleotide changes accumulate in strains sampled at different times that share a common ancestor. This ...approach has been useful for dating ecological and evolutionary events that coincide with the emergence of important lineages, such as outbreak strains and obligate human pathogens. However, in multi-host (niche) transmission scenarios, where the pathogen is essentially an opportunistic environmental organism, sampling is often sporadic and rarely reflects the overall population, particularly when concentrated on clinical isolates. This means that approaches that assume recent common ancestry are not applicable. Here we present a new approach to estimate the molecular clock rate in Campylobacter that draws on the popular probability conundrum known as the 'birthday problem'. Using large genomic datasets and comparative genomic approaches, we use isolate pairs that share recent common ancestry to estimate the rate of nucleotide change for the population. Identifying synonymous and non-synonymous nucleotide changes, both within and outside of recombined regions of the genome, we quantify clock-like diversification to estimate synonymous rates of nucleotide change for the common pathogenic bacteria Campylobacter coli (2.4 x 10-6 s/s/y) and Campylobacter jejuni (3.4 x 10-6 s/s/y). Finally, using estimated total rates of nucleotide change, we infer the number of effective lineages within the sample time frame-analogous to a shared birthday-and assess the rate of turnover of lineages in our sample set over short evolutionary timescales. This provides a generalizable approach to calibrating rates in populations of environmental bacteria and shows that multiple lineages are maintained, implying that large-scale clonal sweeps may take hundreds of years or more in these species.
Multicellular biofilms are an ancient bacterial adaptation that offers a protective environment for survival in hostile habitats. In microaerophilic organisms such as Campylobacter, biofilms play a ...key role in transmission to humans as the bacteria are exposed to atmospheric oxygen concentrations when leaving the reservoir host gut. Genetic determinants of biofilm formation differ between species, but little is known about how strains of the same species achieve the biofilm phenotype with different genetic backgrounds. Our approach combines genome‐wide association studies with traditional microbiology techniques to investigate the genetic basis of biofilm formation in 102 Campylobacter jejuni isolates. We quantified biofilm formation among the isolates and identified hotspots of genetic variation in homologous sequences that correspond to variation in biofilm phenotypes. Thirteen genes demonstrated a statistically robust association including those involved in adhesion, motility, glycosylation, capsule production and oxidative stress. The genes associated with biofilm formation were different in the host generalist ST‐21 and ST‐45 clonal complexes, which are frequently isolated from multiple host species and clinical samples. This suggests the evolution of enhanced biofilm from different genetic backgrounds and a possible role in colonization of multiple hosts and transmission to humans.