► Escherichia coli is a diverse group of commensal, pathogenic and environmental bacteria. ► E. coli diversity was initially investigated by serotyping and DNA hybridisation. ► MLEE, single gene ...sequencing and MLST enabled quantitative molecular studies. ► Whole genome phylogenetic analyses allow construction of a robust E. coli phylogeny. ► Most E. coli pathovars have arisen independently on multiple occasions.
Escherichia coli is familiar to biologists as a classical model system, ubiquitous in molecular biology laboratories around the world. Outside of the laboratory, E. coli strains exist as an almost universal component of the lower-gut flora of humans and animals. Although usually a commensal, E. coli has an alter ego as a pathogen, and is associated with diarrhoeal disease and extra-intestinal infections. The study of E. coli diversity predates the availability of molecular data, with strains initially distinguished by serotyping and metabolic profiling, and genomic diversity illustrated by DNA hybridisation. The quantitative study of E. coli diversity began with the application of multi-locus enzyme electrophoresis (MLEE), and has progressed with the accumulation of nucleotide sequence data, from single genes through multi-locus sequence typing (MLST) to whole genome sequencing. Phylogenetic methods have shed light on the processes of genomic evolution in this extraordinarily diverse species, and revealed the origins of pathogenic E. coli strains, including members of the phylogenetically indistinguishable “genus”Shigella. In May and June 2011, an outbreak of haemorrhagic uraemic syndrome in Germany was linked to a strain of enterohaemorrhagic E. coli (EHEC) O104:H4. Application of high-throughput sequencing technologies allowed the genome and origins of the outbreak strain to be characterised in real time as the outbreak was in progress.
Clostridioides difficile is responsible for substantial morbidity and mortality in antibiotically-treated, hospitalised, elderly patients, in which toxin production correlates with diarrhoeal ...disease. While the function of these toxins has been studied in detail, the contribution of other factors, including the paracrystalline surface layer (S-layer), to disease is less well understood. Here, we highlight the essentiality of the S-layer in vivo by reporting the recovery of S-layer variants, following infection with the S-layer-null strain, FM2.5. These variants carry either correction of the original point mutation, or sequence modifications which restored the reading frame, and translation of slpA. Selection of these variant clones was rapid in vivo, and independent of toxin production, with up to 90% of the recovered C. difficile population encoding modified slpA sequence within 24 h post infection. Two variants, subsequently named FM2.5varA and FM2.5varB, were selected for study in greater detail. Structural determination of SlpA from FM2.5varB indicated an alteration in the orientation of protein domains, resulting in a reorganisation of the lattice assembly, and changes in interacting interfaces, which might alter function. Interestingly, variant FM2.5varB displayed an attenuated, FM2.5-like phenotype in vivo compared to FM2.5varA, which caused disease severity more comparable to that of R20291. Comparative RNA sequencing (RNA-Seq) analysis of in vitro grown isolates revealed large changes in gene expression between R20291 and FM2.5. Downregulation of tcdA/tcdB and several genes associated with sporulation and cell wall integrity may account for the reported attenuated phenotype of FM2.5 in vivo. RNA-seq data correlated well with disease severity with the more virulent variant, FM2.5varA, showing s similar profile of gene expression to R20291 in vitro, while the attenuated FM2.5varB showed downregulation of many of the same virulence associated traits as FM2.5. Cumulatively, these data add to a growing body of evidence that the S-layer contributes to C. difficile pathogenesis and disease severity.
The chicken is the most abundant food animal in the world. However, despite its importance, the chicken gut microbiome remains largely undefined. Here, we exploit culture-independent and ...culture-dependent approaches to reveal extensive taxonomic diversity within this complex microbial community.
We performed metagenomic sequencing of fifty chicken faecal samples from two breeds and analysed these, alongside all (
= 582) relevant publicly available chicken metagenomes, to cluster over 20 million non-redundant genes and to construct over 5,500 metagenome-assembled bacterial genomes. In addition, we recovered nearly 600 bacteriophage genomes. This represents the most comprehensive view of taxonomic diversity within the chicken gut microbiome to date, encompassing hundreds of novel candidate bacterial genera and species. To provide a stable, clear and memorable nomenclature for novel species, we devised a scalable combinatorial system for the creation of hundreds of well-formed Latin binomials. We cultured and genome-sequenced bacterial isolates from chicken faeces, documenting over forty novel species, together with three species from the genus
, including the newly named species
.
Our metagenomic and culture-based analyses provide new insights into the bacterial, archaeal and bacteriophage components of the chicken gut microbiome. The resulting datasets expand the known diversity of the chicken gut microbiome and provide a key resource for future high-resolution taxonomic and functional studies on the chicken gut microbiome.
Abstract
Motivation
Probabilistic Identification of bacterial essential genes using transposon-directed insertion-site sequencing (TraDIS) data based on Tn5 libraries has received relatively little ...attention in the literature; most methods are designed for mariner transposon insertions. Analysis of Tn5 transposon-based genomic data is challenging due to the high insertion density and genomic resolution. We present a novel probabilistic Bayesian approach for classifying bacterial essential genes using transposon insertion density derived from transposon insertion sequencing data. We implement a Markov chain Monte Carlo sampling procedure to estimate the posterior probability that any given gene is essential. We implement a Bayesian decision theory approach to selecting essential genes. We assess the effectiveness of our approach via analysis of both simulated data and three previously published Escherichia coli, Salmonella Typhimurium and Staphylococcus aureus datasets. These three bacteria have relatively well characterized essential genes which allows us to test our classification procedure using receiver operating characteristic curves and area under the curves. We compare the classification performance with that of Bio-Tradis, a standard tool for bacterial gene classification.
Results
Our method is able to classify genes in the three datasets with areas under the curves between 0.967 and 0.983. Our simulated synthetic datasets show that both the number of insertions and the extent to which insertions are tolerated in the distal regions of essential genes are both important in determining classification accuracy. Importantly our method gives the user the option of classifying essential genes based on the user-supplied costs of false discovery and false non-discovery.
Availability and implementation
An R package that implements the method presented in this paper is available for download from https://github.com/Kevin-walters/insdens.
Supplementary information
Supplementary data are available at Bioinformatics online.
Salmonella Typhimurium sequence type (ST) 313 causes invasive nontyphoidal Salmonella (iNTS) disease in sub-Saharan Africa, targeting susceptible HIV+, malarial, or malnourished individuals. An ...in-depth genomic comparison between the ST313 isolate D23580 and the well-characterized ST19 isolate 4/74 that causes gastroenteritis across the globe revealed extensive synteny. To understand how the 856 nucleotide variations generated phenotypic differences, we devised a large-scale experimental approach that involved the global gene expression analysis of strains D23580 and 4/74 grown in 16 infection-relevant growth conditions. Comparison of transcriptional patterns identified virulence and metabolic genes that were differentially expressed between D23580 versus 4/74, many of which were validated by proteomics. We also uncovered the S. Typhimurium D23580 and 4/74 genes that showed expression differences during infection of murine macrophages. Our comparative transcriptomic data are presented in a new enhanced version of the Salmonella expression compendium, SalComD23580: http://bioinf.gen.tcd.ie/cgi-bin/salcom_v2.pl. We discovered that the ablation of melibiose utilization was caused by three independent SNP mutations in D23580 that are shared across ST313 lineage 2, suggesting that the ability to catabolize this carbon source has been negatively selected during ST313 evolution. The data revealed a novel, to our knowledge, plasmid maintenance system involving a plasmid-encoded CysS cysteinyl-tRNA synthetase, highlighting the power of large-scale comparative multicondition analyses to pinpoint key phenotypic differences between bacterial pathovariants.
Chickens, pigs, and cattle are key reservoirs of Salmonella enterica, a foodborne pathogen of worldwide importance. Though a decade has elapsed since publication of the first Salmonella genome, ...thousands of genes remain of hypothetical or unknown function, and the basis of colonization of reservoir hosts is ill-defined. Moreover, previous surveys of the role of Salmonella genes in vivo have focused on systemic virulence in murine typhoid models, and the genetic basis of intestinal persistence and thus zoonotic transmission have received little study. We therefore screened pools of random insertion mutants of S. enterica serovar Typhimurium in chickens, pigs, and cattle by transposon-directed insertion-site sequencing (TraDIS). The identity and relative fitness in each host of 7,702 mutants was simultaneously assigned by massively parallel sequencing of transposon-flanking regions. Phenotypes were assigned to 2,715 different genes, providing a phenotype-genotype map of unprecedented resolution. The data are self-consistent in that multiple independent mutations in a given gene or pathway were observed to exert a similar fitness cost. Phenotypes were further validated by screening defined null mutants in chickens. Our data indicate that a core set of genes is required for infection of all three host species, and smaller sets of genes may mediate persistence in specific hosts. By assigning roles to thousands of Salmonella genes in key reservoir hosts, our data facilitate systems approaches to understand pathogenesis and the rational design of novel cross-protective vaccines and inhibitors. Moreover, by simultaneously assigning the genotype and phenotype of over 90% of mutants screened in complex pools, our data establish TraDIS as a powerful tool to apply rich functional annotation to microbial genomes with minimal animal use.
Bacterial WxL proteins contain peptidoglycan‐binding WxL domains, which have a dual Trp‐x‐Leu motif and are involved in virulence. It was recently shown that WxL proteins occur in gene clusters, ...containing typically a small WxL protein (which in the mature protein consists only of a WxL domain), a large WxL protein (which contains a C‐terminal WxL domain with N‐terminal host‐binding domains), and a conserved protein annotated as a Domain of Unknown Function (DUF). Here we analyze this DUF and show that it contains two tandem domains—DUF916 and DUF3324—which both have an IgG‐like fold and together form a single functional unit, connected to a C‐terminal transmembrane helix. DUF3324 is a stable domain, while DUF916 is less stable and is likely to require a stabilizing interaction with WxL. The protein is suggested to have an important role to bind and stabilize WxL on the peptidoglycan surface, via the DUF916 domain, and to bind to host cells via the DUF3324 domain. AlphaFold2 predicts that a β‐hairpin strand from DUF916 inserts into WxL adjacent to its N‐terminus. We therefore propose to rename the DUF916‐DUF3324 pair as WxL Interacting Protein (WxLIP), with DUF916, DUF3324 and the transmembrane helix forming the first, second and third domains of WxLIP, which we characterize as peptidoglycan binding domain (PGBD), host binding domain (HBD), and transmembrane helix (TMH) respectively.
In recent years there has been an increasing problem with Staphylococcus aureus strains that are resistant to treatment with existing antibiotics. An important starting point for the development of ...new antimicrobial drugs is the identification of "essential" genes that are important for bacterial survival and growth.
We have developed a robust microarray and PCR-based method, Transposon-Mediated Differential Hybridisation (TMDH), that uses novel bioinformatics to identify transposon inserts in genome-wide libraries. Following a microarray-based screen, genes lacking transposon inserts are re-tested using a PCR and sequencing-based approach. We carried out a TMDH analysis of the S. aureus genome using a large random mariner transposon library of around a million mutants, and identified a total of 351 S. aureus genes important for survival and growth in culture. A comparison with the essential gene list experimentally derived for Bacillus subtilis highlighted interesting differences in both pathways and individual genes.
We have determined the first comprehensive list of S. aureus essential genes. This should act as a useful starting point for the identification of potential targets for novel antimicrobial compounds. The TMDH methodology we have developed is generic and could be applied to identify essential genes in other bacterial pathogens.