Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the ...need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria 'from domain to strain'. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data.
Clostridium difficile infection (CDI) is an important cause of mortality and morbidity in healthcare settings. The major virulence determinants are large clostridial toxins, toxin A (tcdA) and toxin ...B (tcdB), encoded within the pathogenicity locus (PaLoc). Isolates vary in pathogenicity from hypervirulent PCR-ribotypes 027 and 078 with high mortality, to benign non-toxigenic strains carried asymptomatically. The relative pathogenicity of most toxigenic genotypes is still unclear, but may be influenced by PaLoc genetic variant. This is the largest study of C. difficile molecular epidemiology performed to date, in which a representative collection of recent isolates (n = 1290) from patients with CDI in Oxfordshire, UK, was genotyped by multilocus sequence typing. The population structure was described using NeighborNet and ClonalFrame. Sequence variation within toxin B (tcdB) and its negative regulator (tcdC), was mapped onto the population structure. The 69 Sequence Types (ST) showed evidence for homologous recombination with an effect on genetic diversification four times lower than mutation. Five previously recognised genetic groups or clades persisted, designated 1 to 5, each having a strikingly congruent association with tcdB and tcdC variants. Hypervirulent ST-11 (078) was the only member of clade 5, which was divergent from the other four clades within the MLST loci. However, it was closely related to the other clades within the tcdB and tcdC loci. ST-11 (078) may represent a divergent formerly non-toxigenic strain that acquired the PaLoc (at least) by genetic recombination. This study focused on human clinical isolates collected from a single geographic location, to achieve a uniquely high density of sampling. It sets a baseline of MLST data for future comparative studies investigating genotype virulence potential (using clinical severity data for these isolates), possible reservoirs of human CDI, and the evolutionary origins of hypervirulent strains.
•We used whole genome sequencing to characterise Haemophilus influenzae isolates.•Isolates were highly variable.•We describe resistance to beta lactams including third generation cephalosporins.•The ...resistance was through beta lactamase acquisition and/or modified ftsI gene.
Invasive infections due to Haemophilus influenzae are infrequent following the implementation of vaccination against H. influenzae of serotype b. However, their changing epidemiology may not be clear due to a lack of appropriate genotyping methods combined with antibiotic susceptibility analyses which do not discriminate invasive and non-invasive isolates. We aimed to describe recent epidemiological trends of invasive H. influenzae infections in France and explore the microbiological characteristics of invasive versus non-invasive isolates.
All culture- and PCR-confirmed cases due to H. influenzae isolated from a sterile site, that were received at the French national reference centre for H. influenzae during the year 2017 (n = 138) were characterized by whole genome sequencing (WGS), serotyping and antibiotic susceptibility testing. We also included 100 isolates that were received from non-invasive infections.
Most of the non-invasive isolates were non-typeable (99%) and this proportion was significantly less among invasive isolates 75%, p < 0.0001). Serotype f was the most frequently observed but serotypes b and a were also present among invasive isolates. WGS analysis suggested a serotype b to a capsule switching event. Non-typeable isolates showed extensive heterogeneity. Antibiotic susceptibility testing indicated that 24% of the invasive isolates were resistant to ampicillin but this percentage was significantly higher (51%, p < 0.001) among the non-invasive isolates. Moreover, the proportion of beta-lactamase negative ampicillin resistant isolates (BLNAR) was significantly higher among non-invasive isolates compared to that of invasive isolates (24% versus 7%, p < 0.001). BLNAR isolates were linked to modification in the ftsI gene encoding the penicillin binding protein 3 (PBP3). In particular, ftsI alleles that harboured the mutations D350N, S357N, M377I and S385T were resistant to ampicillin and third generation cephalosporins. These isolates were more frequent among non-invasive isolates.
Our data suggest that invasive H. influenzae isolates differed phenotypically and genotypically from non-invasive isolates. The high proportion of ampicillin resistance by mutation in ftsI among non-invasive isolates may suggest a biological cost of these mutations on the function of PBP3 that can lead to lower bacterial invasiveness. WGS should be used routinely for the characterization of H. influenzae isolates in order to reliably follow the emergence, spread and mechanism of antibiotic resistance.
In common with other bacterial taxa, members of the genus Neisseria are classified using a range of phenotypic and biochemical approaches, which are not entirely satisfactory in assigning isolates to ...species groups. Recently, there has been increasing interest in using nucleotide sequences for bacterial typing and taxonomy, but to date, no broadly accepted alternative to conventional methods is available. Here, the taxonomic relationships of 55 representative members of the genus Neisseria have been analysed using whole-genome sequence data. As genetic material belonging to the accessory genome is widely shared among different taxa but not present in all isolates, this analysis indexed nucleotide sequence variation within sets of genes, specifically protein-coding genes that were present and directly comparable in all isolates. Variation in these genes identified seven species groups, which were robust to the choice of genes and phylogenetic clustering methods used. The groupings were largely, but not completely, congruent with current species designations, with some minor changes in nomenclature and the reassignment of a few isolates necessary. In particular, these data showed that isolates classified as Neisseria polysaccharea are polyphyletic and probably include more than one taxonomically distinct organism. The seven groups could be reliably and rapidly generated with sequence variation within the 53 ribosomal protein subunit (rps) genes, further demonstrating that ribosomal multilocus sequence typing (rMLST) is a practicable and powerful means of characterizing bacteria at all levels, from domain to strain.
is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A ...multilocus sequence typing (MLST) database for
was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 ST3), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among
strains using WGS data. We sequenced 92
genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a
cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of
strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage.
Human campylobacteriosis, caused by
and
, remains a leading cause of bacterial gastroenteritis in many countries, but the epidemiology of campylobacteriosis outbreaks remains poorly defined, largely ...due to limitations in the resolution and comparability of isolate characterization methods. Whole-genome sequencing (WGS) data enable the improvement of sequence-based typing approaches, such as multilocus sequence typing (MLST), by substantially increasing the number of loci examined. A core genome MLST (cgMLST) scheme defines a comprehensive set of those loci present in most members of a bacterial group, balancing very high resolution with comparability across the diversity of the group. Here we propose a set of 1,343 loci as a human campylobacteriosis cgMLST scheme (v1.0), the allelic profiles of which can be assigned to core genome sequence types. The 1,343 loci chosen were a subset of the 1,643 loci identified in the reannotation of the genome sequence of
isolate NCTC 11168, chosen as being present in >95% of draft genomes of 2,472 representative United Kingdom campylobacteriosis isolates, comprising 2,207 (89.3%)
isolates and 265 (10.7%)
isolates. Validation of the cgMLST scheme was undertaken with 1,478 further high-quality draft genomes, containing 150 or fewer contiguous sequences, from disease isolate collections: 99.5% of these isolates contained ≥95% of the 1,343 cgMLST loci. In addition to the rapid and effective high-resolution analysis of large numbers of diverse isolates, the cgMLST scheme enabled the efficient identification of very closely related isolates from a well-defined single-source campylobacteriosis outbreak.
A robust high-throughput multilocus sequence typing (MLST) scheme for Clostridium difficile was developed and validated using a diverse collection of 50 reference isolates representing 45 different ...PCR ribotypes and 102 isolates from recent clinical samples. A total of 49 PCR ribotypes were represented overall. All isolates were typed by MLST and yielded 40 sequence types (STs). A web-accessible database was set up (http://pubmlst.org/cdifficile/) to facilitate the dissemination and comparison of C. difficile MLST genotyping data among laboratories. MLST and PCR ribotyping were similar in discriminatory abilities, having indices of discrimination of 0.90 and 0.92, respectively. Some STs corresponded to a single PCR ribotype (32/40), other STs corresponded to multiple PCR ribotypes (8/40), and, conversely, the PCR ribotype was not always predictive of the ST. The total number of variable nucleotide sites in the concatenated MLST sequences was 103/3,501 (2.9%). Concatenated MLST sequences were used to construct a neighbor-joining tree which identified four phylogenetic groups of STs and one outlier (ST-11; PCR ribotype 078). These groups apparently correlate with clades identified previously by comparative genomics. The MLST scheme was sufficiently robust to allow direct genotyping of C. difficile in total stool DNA extracts without isolate culture. The direct (nonculture) MLST approach may prove useful as a rapid genotyping method, potentially benefiting individual patients and informing hospital infection control.
No single genealogical reconstruction or typing method currently encompasses all levels of bacterial diversity, from domain to strain. We propose ribosomal multilocus sequence typing (rMLST), an ...approach which indexes variation of the 53 genes encoding the bacterial ribosome protein subunits (rps genes), as a means of integrating microbial genealogy and typing. As with multilocus sequence typing (MLST), rMLST employs curated reference sequences to identify gene variants efficiently and rapidly. The rps loci are ideal targets for a universal characterization scheme as they are: (i) present in all bacteria; (ii) distributed around the chromosome; and (iii) encode proteins which are under stabilizing selection for functional conservation. Collectively, the rps loci exhibit variation that resolves bacteria into groups at all taxonomic and most typing levels, providing significantly more resolution than 16S small subunit rRNA gene phylogenies. A web-accessible expandable database, comprising whole-genome data from more than 1900 bacterial isolates, including 28 draft genomes assembled de novo from the European Bioinformatics Institute (EBI) sequence read archive, has been assembled. The rps gene variation catalogued in this database permits rapid and computationally non-intensive identification of the phylogenetic position of any bacterial sequence at the domain, phylum, class, order, family, genus, species and strain levels. The groupings generated with rMLST data are consistent with current nomenclature schemes and independent of the clustering algorithm used. This approach is applicable to the other domains of life, potentially providing a rational and universal approach to the classification of life that is based on one of its fundamental features, the translation mechanism.
The eubacterial genus Wolbachia comprises one of the most abundant groups of obligate intracellular bacteria, and it has a host range that spans the phyla Arthropoda and Nematoda. Here we developed a ...multilocus sequence typing (MLST) scheme as a universal genotyping tool for WOLBACHIA: Internal fragments of five ubiquitous genes (gatB, coxA, hcpA, fbpA, and ftsZ) were chosen, and primers that amplified across the major Wolbachia supergroups found in arthropods, as well as other divergent lineages, were designed. A supplemental typing system using the hypervariable regions of the Wolbachia surface protein (WSP) was also developed. Thirty-seven strains belonging to supergroups A, B, D, and F obtained from singly infected hosts were characterized by using MLST and WSP. The number of alleles per MLST locus ranged from 25 to 31, and the average levels of genetic diversity among alleles were 6.5% to 9.2%. A total of 35 unique allelic profiles were found. The results confirmed that there is a high level of recombination in chromosomal genes. MLST was shown to be effective for detecting diversity among strains within a single host species, as well as for identifying closely related strains found in different arthropod hosts. Identical or similar allelic profiles were obtained for strains harbored by different insect species and causing distinct reproductive phenotypes. Strains with similar WSP sequences can have very different MLST allelic profiles and vice versa, indicating the importance of the MLST approach for strain identification. The MLST system provides a universal and unambiguous tool for strain typing, population genetics, and molecular evolutionary studies. The central database for storing and organizing Wolbachia bacterial and host information can be accessed at http://pubmlst.org/wolbachia/.
Brucellosis poses a significant burden to human and animal health worldwide. Robust and harmonized molecular epidemiological approaches and population studies that include routine disease screening ...are needed to efficiently track the origin and spread of Brucella strains. Core genome multilocus sequence typing (cgMLST) is a powerful genotyping system commonly used to delineate pathogen transmission routes for disease surveillance and control. Except for Brucella melitensis, cgMLST schemes for Brucella species are currently not established. Here, we describe a novel cgMLST scheme that covers multiple Brucella species. We first determined the phylogenetic breadth of the genus using 612 Brucella genomes. We selected 1,764 genes that were particularly well conserved and typeable in at least 98% of these genomes. We tested the new scheme on 600 genomes and found high agreement with the whole-genome-based single nucleotide polymorphism (SNP) analysis. Next, we applied the scheme to reanalyze the genome of Brucella strains from epidemiologically linked outbreaks. We demonstrated the applicability of the new scheme for high-resolution typing required in outbreak investigations as previously reported with whole-genome SNP methods. We also used the novel scheme to define the global population structure of the genus using 1,322 Brucella genomes. Finally, we demonstrated the possibility of tracing distribution of Brucella strains by performing cluster analysis of cgMLST profiles and found nearly identical cgMLST profiles in different countries. Our results show that sequencing depth of more than 40-fold is optimal for allele calling with this scheme. In summary, this study describes a novel Brucella-wide cgMLST scheme that is applicable in Brucella molecular epidemiology and helps in accurately tracking and thus controlling the sources of infection. The scheme is publicly accessible and should represent a valuable resource for laboratories with limited computational resources and bioinformatics expertise.