The prediction of cellular function from a genotype is a fundamental goal in biology. For metabolism, constraint-based modelling methods systematize biochemical, genetic and genomic knowledge into a ...mathematical framework that enables a mechanistic description of metabolic physiology. The use of constraint-based approaches has evolved over ~30 years, and an increasing number of studies have recently combined models with high-throughput data sets for prospective experimentation. These studies have led to validation of increasingly important and relevant biological predictions. As reviewed here, these recent successes have tangible implications in the fields of microbial evolution, interaction networks, genetic engineering and drug discovery.
Constraint-based reconstruction and analysis (COBRA) methods at the genome scale have been under development since the first whole-genome sequences appeared in the mid-1990s. A few years ago, this ...approach began to demonstrate the ability to predict a range of cellular functions, including cellular growth capabilities on various substrates and the effect of gene knockouts at the genome scale. Thus, much interest has developed in understanding and applying these methods to areas such as metabolic engineering, antibiotic design, and organismal and enzyme evolution. This Primer will get you started.
Genome-scale computational reconstructions of organisms have applications for metabolic engineering, antibiotic design, and organismal and enzyme evolution.
Current machine learning classifiers have successfully been applied to whole-genome sequencing data to identify genetic determinants of antimicrobial resistance (AMR), but they lack causal ...interpretation. Here we present a metabolic model-based machine learning classifier, named Metabolic Allele Classifier (MAC), that uses flux balance analysis to estimate the biochemical effects of alleles. We apply the MAC to a dataset of 1595 drug-tested Mycobacterium tuberculosis strains and show that MACs predict AMR phenotypes with accuracy on par with mechanism-agnostic machine learning models (isoniazid AUC = 0.93) while enabling a biochemical interpretation of the genotype-phenotype map. Interpretation of MACs for three antibiotics (pyrazinamide, para-aminosalicylic acid, and isoniazid) recapitulates known AMR mechanisms and suggest a biochemical basis for how the identified alleles cause AMR. Extending flux balance analysis to identify accurate sequence classifiers thus contributes mechanistic insights to GWAS, a field thus far dominated by mechanism-agnostic results.
Shigella species are specialised lineages of Escherichia coli that have converged to become human-adapted and cause dysentery by invading human gut epithelial cells. Most studies of Shigella ...evolution have been restricted to comparisons of single representatives of each species; and population genomic studies of individual Shigella species have focused on genomic variation caused by single nucleotide variants and ignored the contribution of insertion sequences (IS) which are highly prevalent in Shigella genomes. Here, we investigate the distribution and evolutionary dynamics of IS within populations of Shigella dysenteriae Sd1, Shigella sonnei and Shigella flexneri. We find that five IS (IS1, IS2, IS4, IS600 and IS911) have undergone expansion in all Shigella species, creating substantial strain-to-strain variation within each population and contributing to convergent patterns of functional gene loss within and between species. We find that IS expansion and genome degradation are most advanced in S. dysenteriae and least advanced in S. sonnei; and using genome-scale models of metabolism we show that Shigella species display convergent loss of core E. coli metabolic capabilities, with S. sonnei and S. flexneri following a similar trajectory of metabolic streamlining to that of S. dysenteriae. This study highlights the importance of IS to the evolution of Shigella and provides a framework for the investigation of IS dynamics and metabolic reduction in other bacterial species.
With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. However, ...relatively few studies have scaled beyond single pangenomes to compare global genetic diversity both within and across different species. We present here several methods for "comparative pangenomics" that can be used to contextualize multi-pangenome scale genetic diversity with gene function for multiple species at multiple resolutions: pangenome shape, genes, sequence variants, and positions within variants.
Applied to 12,676 genomes across 12 microbial pathogenic species, we observed several shared resolution-specific patterns of genetic diversity: First, pangenome openness is associated with species' phylogenetic placement. Second, relationships between gene function and frequency are conserved across species, with core genomes enriched for metabolic and ribosomal genes and accessory genomes for trafficking, secretion, and defense-associated genes. Third, genes in core genomes with the highest sequence diversity are functionally diverse. Finally, certain protein domains are consistently mutation enriched across multiple species, especially among aminoacyl-tRNA synthetases where the extent of a domain's mutation enrichment is strongly function-dependent.
These results illustrate the value of each resolution at uncovering distinct aspects in the relationship between genetic and functional diversity across multiple species. With the continued growth of the number of sequenced genomes, these methods will reveal additional universal patterns of genetic diversity at the pangenome scale.
Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR ...characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens.
Salmonella strains are traditionally classified into serovars based on their surface antigens. While increasing availability of whole-genome sequences has allowed for more detailed subtyping of ...strains, links between genotype, serovar, and host remain elusive. Here we reconstruct genome-scale metabolic models for 410 Salmonella strains spanning 64 serovars. Model-predicted growth capabilities in over 530 different environments demonstrate that: (1) the Salmonella accessory metabolic network includes alternative carbon metabolism, and cell wall biosynthesis; (2) metabolic capabilities correspond to each strain's serovar and isolation host; (3) growth predictions agree with 83.1% of experimental outcomes for 12 strains (690 out of 858); (4) 27 strains are auxotrophic for at least one compound, including L-tryptophan, niacin, L-histidine, L-cysteine, and p-aminobenzoate; and (5) the catabolic pathways that are important for fitness in the gastrointestinal environment are lost amongst extraintestinal serovars. Our results reveal growth differences that may reflect adaptation to particular colonization sites.
Genome-scale models (GEMs) of metabolism were constructed for 55 fully sequenced Escherichia coli and Shigella strains. The GEMs enable a systems approach to characterizing the pan and core metabolic ...capabilities of the E. coli species. The majority of pan metabolic content was found to consist of alternate catabolic pathways for unique nutrient sources. The GEMs were then used to systematically analyze growth capabilities in more than 650 different growth-supporting environments. The results show that unique strain-specific metabolic capabilities correspond to pathotypes and environmental niches. Twelve of the GEMs were used to predict growth on six differentiating nutrients, and the predictions were found to agree with 80% of experimental outcomes. Additionally, GEMs were used to predict strain-specific auxotrophies. Twelve of the strains modeled were predicted to be auxotrophic for vitamins niacin (vitamin B ₃), thiamin (vitamin B ₁), or folate (vitamin B ₉). Six of the strains modeled have lost biosynthetic pathways for essential amino acids methionine, tryptophan, or leucine. Genome-scale analysis of multiple strains of a species can thus be used to define the metabolic essence of a microbial species and delineate growth differences that shed light on the adaptation process to a particular microenvironment.
The evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial ...isolates and require robust computational tools to systematically elucidate the genetic basis for AMR. Here, we present a generalizable machine learning workflow for identifying genetic features driving AMR based on constructing reference strain-agnostic pan-genomes and training random subspace ensembles (RSEs). This workflow was applied to the resistance profiles of 14 antimicrobials across three urgent threat pathogens encompassing 288 Staphylococcus aureus, 456 Pseudomonas aeruginosa, and 1588 Escherichia coli genomes. We find that feature selection by RSE detects known AMR associations more reliably than common statistical tests and previous ensemble approaches, identifying a total of 45 known AMR-conferring genes and alleles across the three organisms, as well as 25 candidate associations backed by domain-level annotations. Furthermore, we find that results from the RSE approach are consistent with existing understanding of fluoroquinolone (FQ) resistance due to mutations in the main drug targets, gyrA and parC, in all three organisms, and suggest the mutational landscape of those genes with respect to FQ resistance is simple. As larger datasets become available, we expect this approach to more reliably predict AMR determinants for a wider range of microbial pathogens.
The existence of discrete phenotypic traits suggests that the complex regulatory processes which produce them are functionally modular. These processes are usually represented by networks. Only ...modular networks can be partitioned into intelligible subcircuits able to evolve relatively independently. Traditionally, functional modularity is approximated by detection of modularity in network structure. However, the correlation between structure and function is loose. Many regulatory networks exhibit modular behaviour without structural modularity. Here we partition an experimentally tractable regulatory network-the gap gene system of dipteran insects-using an alternative approach. We show that this system, although not structurally modular, is composed of dynamical modules driving different aspects of whole-network behaviour. All these subcircuits share the same regulatory structure, but differ in components and sensitivity to regulatory interactions. Some subcircuits are in a state of criticality, while others are not, which explains the observed differential evolvability of the various expression features in the system.