Background
is a leading global cause of bacterial gastroenteritis, motivating research to identify sources of human infection. Population genetic studies have been increasingly applied to this end, ...mainly using multilocus sequence typing (MLST) data.ObjectivesThis review aimed to summarise approaches and findings of these studies and identify best practice lessons for this form of genomic epidemiology.MethodsWe systematically reviewed publications using MLST data to attribute human disease isolates to source. Publications were from January 2001, when this type of approach began. Searched databases included Scopus, Web of Science and PubMed. Information on samples and isolate datasets used, as well as MLST schemes and attribution algorithms employed, was obtained. Main findings were extracted, as well as any results' validation with subsequent correction for identified biases. Meta-analysis is not reported given high levels of heterogeneity.ResultsOf 2,109 studies retrieved worldwide, 25 were included, and poultry, specifically chickens, were identified as principal source of human infection. Ruminants (cattle or sheep) were consistently implicated in a substantial proportion of cases. Data sampling and analytical approaches varied, with five different attribution algorithms used. Validation such as self-attribution of isolates from known sources was reported in five publications. No publication reported adjustment for biases identified by validation.ConclusionsCommon gaps in validation and adjustment highlight opportunities to generate improved estimates in future genomic attribution studies. The consistency of chicken as the main source of human infection, across high income countries, and despite methodological variations, highlights the public health importance of this source.
Campylobacter jejuni and Campylobacter coli are the biggest causes of bacterial gastroenteritis in the developed world, with human infections typically arising from zoonotic transmission associated ...with infected meat. Because Campylobacter is not thought to survive well outside the gut, host-associated populations are genetically isolated to varying degrees. Therefore, the likely origin of most strains can be determined by host-associated variation in the genome. This is instructive for characterizing the source of human infection. However, some common strains, notably isolates belonging to the ST-21, ST-45 and ST-828 clonal complexes, appear to have broad host ranges, hindering source attribution. Here whole-genome sequencing has the potential to reveal fine-scale genetic structure associated with host specificity. We found that rates of zoonotic transmission among animal host species in these clonal complexes were so high that the signal of host association is all but obliterated, estimating one zoonotic transmission event every 1.6, 1.8 and 12 years in the ST-21, ST-45 and ST828 complexes, respectively. We attributed 89% of clinical cases to a chicken source, 10% to cattle and 1% to pig. Our results reveal that common strains of C. jejuni and C. coli infectious to humans are adapted to a generalist lifestyle, permitting rapid transmission between different hosts. Furthermore, they show that the weak signal of host association within these complexes presents a challenge for pinpointing the source of clinical infections, underlining the view that whole-genome sequencing, powerful though it is, cannot substitute for intensive sampling of suspected transmission reservoirs.
Human campylobacteriosis, caused by
and
, remains a leading cause of bacterial gastroenteritis in many countries, but the epidemiology of campylobacteriosis outbreaks remains poorly defined, largely ...due to limitations in the resolution and comparability of isolate characterization methods. Whole-genome sequencing (WGS) data enable the improvement of sequence-based typing approaches, such as multilocus sequence typing (MLST), by substantially increasing the number of loci examined. A core genome MLST (cgMLST) scheme defines a comprehensive set of those loci present in most members of a bacterial group, balancing very high resolution with comparability across the diversity of the group. Here we propose a set of 1,343 loci as a human campylobacteriosis cgMLST scheme (v1.0), the allelic profiles of which can be assigned to core genome sequence types. The 1,343 loci chosen were a subset of the 1,643 loci identified in the reannotation of the genome sequence of
isolate NCTC 11168, chosen as being present in >95% of draft genomes of 2,472 representative United Kingdom campylobacteriosis isolates, comprising 2,207 (89.3%)
isolates and 265 (10.7%)
isolates. Validation of the cgMLST scheme was undertaken with 1,478 further high-quality draft genomes, containing 150 or fewer contiguous sequences, from disease isolate collections: 99.5% of these isolates contained ≥95% of the 1,343 cgMLST loci. In addition to the rapid and effective high-resolution analysis of large numbers of diverse isolates, the cgMLST scheme enabled the efficient identification of very closely related isolates from a well-defined single-source campylobacteriosis outbreak.
No single genealogical reconstruction or typing method currently encompasses all levels of bacterial diversity, from domain to strain. We propose ribosomal multilocus sequence typing (rMLST), an ...approach which indexes variation of the 53 genes encoding the bacterial ribosome protein subunits (rps genes), as a means of integrating microbial genealogy and typing. As with multilocus sequence typing (MLST), rMLST employs curated reference sequences to identify gene variants efficiently and rapidly. The rps loci are ideal targets for a universal characterization scheme as they are: (i) present in all bacteria; (ii) distributed around the chromosome; and (iii) encode proteins which are under stabilizing selection for functional conservation. Collectively, the rps loci exhibit variation that resolves bacteria into groups at all taxonomic and most typing levels, providing significantly more resolution than 16S small subunit rRNA gene phylogenies. A web-accessible expandable database, comprising whole-genome data from more than 1900 bacterial isolates, including 28 draft genomes assembled de novo from the European Bioinformatics Institute (EBI) sequence read archive, has been assembled. The rps gene variation catalogued in this database permits rapid and computationally non-intensive identification of the phylogenetic position of any bacterial sequence at the domain, phylum, class, order, family, genus, species and strain levels. The groupings generated with rMLST data are consistent with current nomenclature schemes and independent of the clustering algorithm used. This approach is applicable to the other domains of life, potentially providing a rational and universal approach to the classification of life that is based on one of its fundamental features, the translation mechanism.
Modern agriculture has dramatically changed the distribution of animal species on Earth. Changes to host ecology have a major impact on the microbiota, potentially increasing the risk of zoonotic ...pathogens being transmitted to humans, but the impact of intensive livestock production on host-associated bacteria has rarely been studied. Here, we use large isolate collections and comparative genomics techniques, linked to phenotype studies, to understand the timescale and genomic adaptations associated with the proliferation of the most common food-born bacterial pathogen (Campylobacter jejuni) in the most prolific agricultural mammal (cattle). Our findings reveal the emergence of cattle specialist C. jejuni lineages from a background of host generalist strains that coincided with the dramatic rise in cattle numbers in the 20th century. Cattle adaptation was associated with horizontal gene transfer and significant gene gain and loss. This may be related to differences in host diet, anatomy, and physiology, leading to the proliferation of globally disseminated cattle specialists of major public health importance. This work highlights how genomic plasticity can allow important zoonotic pathogens to exploit altered niches in the face of anthropogenic change and provides information for mitigating some of the risks posed by modern agricultural systems.
The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to ...develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.