We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy ...without sacrificing scalability.
Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the "CAT" approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100-1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory.
FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.
Unraveling the drivers controlling community assembly is a central issue in ecology. Although it is generally accepted that selection, dispersal, diversification and drift are major community ...assembly processes, defining their relative importance is very challenging. Here, we present a framework to quantitatively infer community assembly mechanisms by phylogenetic bin-based null model analysis (iCAMP). iCAMP shows high accuracy (0.93-0.99), precision (0.80-0.94), sensitivity (0.82-0.94), and specificity (0.95-0.98) on simulated communities, which are 10-160% higher than those from the entire community-based approach. Application of iCAMP to grassland microbial communities in response to experimental warming reveals dominant roles of homogeneous selection (38%) and 'drift' (59%). Interestingly, warming decreases 'drift' over time, and enhances homogeneous selection which is primarily imposed on Bacillales. In addition, homogeneous selection has higher correlations with drought and plant productivity under warming than control. iCAMP provides an effective and robust tool to quantify microbial assembly processes, and should also be useful for plant and animal ecology.
Targeted gene regulation on a genome-wide scale is a powerful strategy for interrogating, perturbing, and engineering cellular systems. Here, we develop a method for controlling gene expression based ...on Cas9, an RNA-guided DNA endonuclease from a type II CRISPR system. We show that a catalytically dead Cas9 lacking endonuclease activity, when coexpressed with a guide RNA, generates a DNA recognition complex that can specifically interfere with transcriptional elongation, RNA polymerase binding, or transcription factor binding. This system, which we call CRISPR interference (CRISPRi), can efficiently repress expression of targeted genes in Escherichia coli, with no detectable off-target effects. CRISPRi can be used to repress multiple target genes simultaneously, and its effects are reversible. We also show evidence that the system can be adapted for gene repression in mammalian cells. This RNA-guided DNA recognition platform provides a simple approach for selectively perturbing gene expression on a genome-wide scale.
Display omitted
► Inactive CRISPR associated 9 protein (dCas9) is repurposed for genome engineering ► dCas9 and a complementary short guide RNA can target specific genomic sites ► CRISPR interference (CRISPRi) can regulate multiple genes without off-target effects ► CRISPRi is compact and can be ported to bacterial and mammalian cells
The authors have developed a CRISPR interference system in which a catalytically dead Cas9 protein can be targeted to a specific genomic site through a complementary small guide RNA, allowing systematic perturbation of gene transcription in bacteria and mammalian cells.
The ecological forces that govern the assembly and stability of the human gut microbiota remain unresolved. We developed a generalizable model‐guided framework to predict higher‐dimensional consortia ...from time‐resolved measurements of lower‐order assemblages. This method was employed to decipher microbial interactions in a diverse human gut microbiome synthetic community. We show that pairwise interactions are major drivers of multi‐species community dynamics, as opposed to higher‐order interactions. The inferred ecological network exhibits a high proportion of negative and frequent positive interactions. Ecological drivers and responsive recipient species were discovered in the network. Our model demonstrated that a prevalent positive and negative interaction topology enables robust coexistence by implementing a negative feedback loop that balances disparities in monospecies fitness levels. We show that negative interactions could generate history‐dependent responses of initial species proportions that frequently do not originate from bistability. Measurements of extracellular metabolites illuminated the metabolic capabilities of monospecies and potential molecular basis of microbial interactions. In sum, these methods defined the ecological roles of major human‐associated intestinal species and illuminated design principles of microbial communities.
Synopsis
Analysis of microbial interactions in a synthetic human gut microbiome community shows that pairwise microbial interactions are major drivers of multi‐species community dynamics. The study reveals ecological drivers, metabolite hub species and ecologically sensitive organisms in the network.
A data‐driven pipeline is used to construct a predictive dynamic model of a diverse anaerobic human gut microbiome community.
Design principles of stable coexistence and history‐dependence are elucidated.
Ecological roles and metabolite profiles are analyzed for each organism.
The study highlights challenges in using phylogenetic and exo‐metabolomic “signals” to predict microbial interactions and community functions.
Analysis of microbial interactions in a synthetic human gut microbiome community shows that pairwise microbial interactions are major drivers of multi‐species community dynamics. The study reveals ecological drivers, metabolite hub species and ecologically sensitive organisms in the network.
To discover novel catabolic enzymes and transporters, we combined high-throughput genetic data from 29 bacteria with an automated tool to find gaps in their catabolic pathways. GapMind for carbon ...sources automatically annotates the uptake and catabolism of 62 compounds in bacterial and archaeal genomes. For the compounds that are utilized by the 29 bacteria, we systematically examined the gaps in GapMind's predicted pathways, and we used the mutant fitness data to find additional genes that were involved in their utilization. We identified novel pathways or enzymes for the utilization of glucosamine, citrulline, myo-inositol, lactose, and phenylacetate, and we annotated 299 diverged enzymes and transporters. We also curated 125 proteins from published reports. For the 29 bacteria with genetic data, GapMind finds high-confidence paths for 85% of utilized carbon sources. In diverse bacteria and archaea, 38% of utilized carbon sources have high-confidence paths, which was improved from 27% by incorporating the fitness-based annotations and our curation. GapMind for carbon sources is available as a web server (http://papers.genomics.lbl.gov/carbon) and takes just 30 seconds for the typical genome.
Unraveling the drivers of community structure and succession in response to environmental change is a central goal in ecology. Although the mechanisms shaping community structure have been ...intensively examined, those controlling ecological succession remain elusive. To understand the relative importance of stochastic and deterministic processes in mediating microbial community succession, a unique framework composed of four different cases was developed for fluidic and nonfluidic ecosystems. The framework was then tested for one fluidic ecosystem: a groundwater system perturbed by adding emulsified vegetable oil (EVO) for uranium immobilization. Our results revealed that groundwater microbial community diverged substantially away from the initial community after EVO amendment and eventually converged to a new community state, which was closely clustered with its initial state. However, their composition and structure were significantly different from each other. Null model analysis indicated that both deterministic and stochastic processes played important roles in controlling the assembly and succession of the groundwater microbial community, but their relative importance was time dependent. Additionally, consistent with the proposed conceptual framework but contradictory to conventional wisdom, the community succession responding to EVO amendment was primarily controlled by stochastic rather than deterministic processes. During the middle phase of the succession, the roles of stochastic processes in controlling community composition increased substantially, ranging from 81.3% to 92.0%. Finally, there are limited successional studies available to support different cases in the conceptual framework, but further well-replicated explicit time-series experiments are needed to understand the relative importance of deterministic and stochastic processes in controlling community succession.
An inability to reliably predict quantitative behaviors for novel combinations of genetic elements limits the rational engineering of biological systems. We developed an expression cassette ...architecture for genetic elements controlling transcription and translation initiation in Escherichia coli: transcription elements encode a common mRNA start, and translation elements use an overlapping genetic motif found in many natural systems. We engineered libraries of constitutive and repressor-regulated promoters along with translation initiation elements following these definitions. We measured activity distributions for each library and selected elements that collectively resulted in expression across a 1,000-fold observed dynamic range. We studied all combinations of curated elements, demonstrating that arbitrary genes are reliably expressed to within twofold relative target expression windows with ∼93% reliability. We expect the genetic element definitions validated here can be collectively expanded to create collections of public-domain standard biological parts that support reliable forward engineering of gene expression at genome scales.
The inability to predict heterologous gene expression levels precisely hinders our ability to engineer biological systems. Using well-characterized regulatory elements offers a potential solution ...only if such elements behave predictably when combined. We synthesized 12,563 combinations of common promoters and ribosome binding sites and simultaneously measured DNA, RNA, and protein levels from the entire library. Using a simple model, we found that RNA and protein expression were within twofold of expected levels 80% and 64% of the time, respectively. The large dataset allowed quantitation of global effects, such as translation rate on mRNA stability and mRNA secondary structure on translation rate. However, the worst 5% of constructs deviated from prediction by 13-fold on average, which could hinder large-scale genetic engineering projects. The ease and scale this of approach indicates that rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.
Genome sequencing has revealed an incredible diversity of bacteria and archaea, but there are no fast and convenient tools for browsing across these genomes. It is cumbersome to view the prevalence ...of homologs for a protein of interest, or the gene neighborhoods of those homologs, across the diversity of the prokaryotes. We developed a web-based tool, fast.genomics, that uses two strategies to support fast browsing across the diversity of prokaryotes. First, the database of genomes is split up. The main database contains one representative from each of the 6,377 genera that have a high-quality genome, and additional databases for each taxonomic order contain up to 10 representatives of each species. Second, homologs of proteins of interest are identified quickly by using accelerated searches, usually in a few seconds. Once homologs are identified, fast.genomics can quickly show their prevalence across taxa, view their neighboring genes, or compare the prevalence of two different proteins. Fast.genomics is available at https://fast.genomics.lbl.gov.
Natural lipids can be used to make biodiesel and many other value-added compounds. In this work, we explored a number of different metabolic engineering strategies for increasing lipid production in ...the oleaginous yeast
Rhodosporidium toruloides
IFO0880. These included increasing the expression of enzymes involved in different aspects of lipid biosynthesis—malic enzyme (
ME
), pyruvate carboxylase (
PYC1
), glycerol-3-P dehydrogenase (
GPD
), and stearoyl-CoA desaturase (
SCD
)—and deleting the gene
PEX10
, required for peroxisome biogenesis. Only malic enzyme and stearoyl-CoA desaturase, when overexpressed, were found to significantly increase lipid production. Only stearoyl-CoA desaturase, when overexpressed, further increased lipid production in a strain previously engineered to overexpress acetyl-CoA carboxylase (
ACC1
) and diacylglycerol acyltransferase (
DGA1
). Our best strain produced 27.4 g/L lipid with an average productivity of 0.31 g/L/h during batch growth on glucose and 89.4 g/L lipid with an average productivity of 0.61 g/L/h during fed-batch growth on glucose. These results further establish
R. toruloides
as a platform organism for the production of lipids and potentially other lipid-derived compounds from sugars.