We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy ...without sacrificing scalability.
Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the "CAT" approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100-1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory.
FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To discover novel catabolic enzymes and transporters, we combined high-throughput genetic data from 29 bacteria with an automated tool to find gaps in their catabolic pathways. GapMind for carbon ...sources automatically annotates the uptake and catabolism of 62 compounds in bacterial and archaeal genomes. For the compounds that are utilized by the 29 bacteria, we systematically examined the gaps in GapMind's predicted pathways, and we used the mutant fitness data to find additional genes that were involved in their utilization. We identified novel pathways or enzymes for the utilization of glucosamine, citrulline, myo-inositol, lactose, and phenylacetate, and we annotated 299 diverged enzymes and transporters. We also curated 125 proteins from published reports. For the 29 bacteria with genetic data, GapMind finds high-confidence paths for 85% of utilized carbon sources. In diverse bacteria and archaea, 38% of utilized carbon sources have high-confidence paths, which was improved from 27% by incorporating the fitness-based annotations and our curation. GapMind for carbon sources is available as a web server (http://papers.genomics.lbl.gov/carbon) and takes just 30 seconds for the typical genome.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Unraveling the drivers controlling community assembly is a central issue in ecology. Although it is generally accepted that selection, dispersal, diversification and drift are major community ...assembly processes, defining their relative importance is very challenging. Here, we present a framework to quantitatively infer community assembly mechanisms by phylogenetic bin-based null model analysis (iCAMP). iCAMP shows high accuracy (0.93-0.99), precision (0.80-0.94), sensitivity (0.82-0.94), and specificity (0.95-0.98) on simulated communities, which are 10-160% higher than those from the entire community-based approach. Application of iCAMP to grassland microbial communities in response to experimental warming reveals dominant roles of homogeneous selection (38%) and 'drift' (59%). Interestingly, warming decreases 'drift' over time, and enhances homogeneous selection which is primarily imposed on Bacillales. In addition, homogeneous selection has higher correlations with drought and plant productivity under warming than control. iCAMP provides an effective and robust tool to quantify microbial assembly processes, and should also be useful for plant and animal ecology.
Targeted gene regulation on a genome-wide scale is a powerful strategy for interrogating, perturbing, and engineering cellular systems. Here, we develop a method for controlling gene expression based ...on Cas9, an RNA-guided DNA endonuclease from a type II CRISPR system. We show that a catalytically dead Cas9 lacking endonuclease activity, when coexpressed with a guide RNA, generates a DNA recognition complex that can specifically interfere with transcriptional elongation, RNA polymerase binding, or transcription factor binding. This system, which we call CRISPR interference (CRISPRi), can efficiently repress expression of targeted genes in Escherichia coli, with no detectable off-target effects. CRISPRi can be used to repress multiple target genes simultaneously, and its effects are reversible. We also show evidence that the system can be adapted for gene repression in mammalian cells. This RNA-guided DNA recognition platform provides a simple approach for selectively perturbing gene expression on a genome-wide scale.
Display omitted
► Inactive CRISPR associated 9 protein (dCas9) is repurposed for genome engineering ► dCas9 and a complementary short guide RNA can target specific genomic sites ► CRISPR interference (CRISPRi) can regulate multiple genes without off-target effects ► CRISPRi is compact and can be ported to bacterial and mammalian cells
The authors have developed a CRISPR interference system in which a catalytically dead Cas9 protein can be targeted to a specific genomic site through a complementary small guide RNA, allowing systematic perturbation of gene transcription in bacteria and mammalian cells.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The ecological forces that govern the assembly and stability of the human gut microbiota remain unresolved. We developed a generalizable model‐guided framework to predict higher‐dimensional consortia ...from time‐resolved measurements of lower‐order assemblages. This method was employed to decipher microbial interactions in a diverse human gut microbiome synthetic community. We show that pairwise interactions are major drivers of multi‐species community dynamics, as opposed to higher‐order interactions. The inferred ecological network exhibits a high proportion of negative and frequent positive interactions. Ecological drivers and responsive recipient species were discovered in the network. Our model demonstrated that a prevalent positive and negative interaction topology enables robust coexistence by implementing a negative feedback loop that balances disparities in monospecies fitness levels. We show that negative interactions could generate history‐dependent responses of initial species proportions that frequently do not originate from bistability. Measurements of extracellular metabolites illuminated the metabolic capabilities of monospecies and potential molecular basis of microbial interactions. In sum, these methods defined the ecological roles of major human‐associated intestinal species and illuminated design principles of microbial communities.
Synopsis
Analysis of microbial interactions in a synthetic human gut microbiome community shows that pairwise microbial interactions are major drivers of multi‐species community dynamics. The study reveals ecological drivers, metabolite hub species and ecologically sensitive organisms in the network.
A data‐driven pipeline is used to construct a predictive dynamic model of a diverse anaerobic human gut microbiome community.
Design principles of stable coexistence and history‐dependence are elucidated.
Ecological roles and metabolite profiles are analyzed for each organism.
The study highlights challenges in using phylogenetic and exo‐metabolomic “signals” to predict microbial interactions and community functions.
Analysis of microbial interactions in a synthetic human gut microbiome community shows that pairwise microbial interactions are major drivers of multi‐species community dynamics. The study reveals ecological drivers, metabolite hub species and ecologically sensitive organisms in the network.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Bacteriophages (phages) are critical players in the dynamics and function of microbial communities and drive processes as diverse as global biogeochemical cycles and human health. Phages tend to be ...predators finely tuned to attack specific hosts, even down to the strain level, which in turn defend themselves using an array of mechanisms. However, to date, efforts to rapidly and comprehensively identify bacterial host factors important in phage infection and resistance have yet to be fully realized. Here, we globally map the host genetic determinants involved in resistance to 14 phylogenetically diverse double-stranded DNA phages using two model Escherichia coli strains (K-12 and BL21) with known sequence divergence to demonstrate strain-specific differences. Using genome-wide loss-of-function and gain-of-function genetic technologies, we are able to confirm previously described phage receptors as well as uncover a number of previously unknown host factors that confer resistance to one or more of these phages. We uncover differences in resistance factors that strongly align with the susceptibility of K-12 and BL21 to specific phage. We also identify both phage-specific mechanisms, such as the unexpected role of cyclic-di-GMP in host sensitivity to phage N4, and more generic defenses, such as the overproduction of colanic acid capsular polysaccharide that defends against a wide array of phages. Our results indicate that host responses to phages can occur via diverse cellular mechanisms. Our systematic and high-throughput genetic workflow to characterize phage-host interaction determinants can be extended to diverse bacteria to generate datasets that allow predictive models of how phage-mediated selection will shape bacterial phenotype and evolution. The results of this study and future efforts to map the phage resistance landscape will lead to new insights into the coevolution of hosts and their phage, which can ultimately be used to design better phage therapeutic treatments and tools for precision microbiome engineering.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most ...samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large ...phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N
2) space and O(N
2
L) time, but FastTree requires just O(NLa + N
) memory and O(N
log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
For many bacteria with sequenced genomes, we do not understand how they synthesize some amino acids. This makes it challenging to reconstruct their metabolism, and has led to speculation that ...bacteria might be cross-feeding amino acids. We studied heterotrophic bacteria from 10 different genera that grow without added amino acids even though an automated tool predicts that the bacteria have gaps in their amino acid synthesis pathways. Across these bacteria, there were 11 gaps in their amino acid biosynthesis pathways that we could not fill using current knowledge. Using genome-wide mutant fitness data, we identified novel enzymes that fill 9 of the 11 gaps and hence explain the biosynthesis of methionine, threonine, serine, or histidine by bacteria from six genera. We also found that the sulfate-reducing bacterium Desulfovibrio vulgaris synthesizes homocysteine (which is a precursor to methionine) by using DUF39, NIL/ferredoxin, and COG2122 proteins, and that homoserine is not an intermediate in this pathway. Our results suggest that most free-living bacteria can likely make all 20 amino acids and illustrate how high-throughput genetics can uncover previously-unknown amino acid biosynthesis genes.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Unraveling the drivers of community structure and succession in response to environmental change is a central goal in ecology. Although the mechanisms shaping community structure have been ...intensively examined, those controlling ecological succession remain elusive. To understand the relative importance of stochastic and deterministic processes in mediating microbial community succession, a unique framework composed of four different cases was developed for fluidic and nonfluidic ecosystems. The framework was then tested for one fluidic ecosystem: a groundwater system perturbed by adding emulsified vegetable oil (EVO) for uranium immobilization. Our results revealed that groundwater microbial community diverged substantially away from the initial community after EVO amendment and eventually converged to a new community state, which was closely clustered with its initial state. However, their composition and structure were significantly different from each other. Null model analysis indicated that both deterministic and stochastic processes played important roles in controlling the assembly and succession of the groundwater microbial community, but their relative importance was time dependent. Additionally, consistent with the proposed conceptual framework but contradictory to conventional wisdom, the community succession responding to EVO amendment was primarily controlled by stochastic rather than deterministic processes. During the middle phase of the succession, the roles of stochastic processes in controlling community composition increased substantially, ranging from 81.3% to 92.0%. Finally, there are limited successional studies available to support different cases in the conceptual framework, but further well-replicated explicit time-series experiments are needed to understand the relative importance of deterministic and stochastic processes in controlling community succession.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK