Bacteria can sense their environment, distinguish between cell types, and deliver proteins to eukaryotic cells. Here, we engineer the interaction between bacteria and cancer cells to depend on ...heterologous environmental signals. We have characterized invasin from
Yersinia pseudotuburculosis as an output module that enables
Escherichia coli to invade cancer-derived cells, including HeLa, HepG2, and U2OS lines. To environmentally restrict invasion, we placed this module under the control of heterologous sensors. With the
Vibrio fischeri lux quorum sensing circuit, the hypoxia-responsive
fdhF promoter, or the arabinose-inducible
araBAD promoter, the bacteria invade cells at densities greater than 10
8
bacteria/ml, after growth in an anaerobic growth chamber or in the presence of 0.02% arabinose, respectively. In the process, we developed a technique to tune the linkage between a sensor and output gene using ribosome binding site libraries and genetic selection. This approach could be used to engineer bacteria to sense the microenvironment of a tumor and respond by invading cancerous cells and releasing a cytotoxic agent.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of ...this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.
With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
HIV-1 Tat transactivation is vital for completion of the viral life cycle and has been implicated in determining proviral latency. We present an extensive experimental/computational study of an HIV-1 ...model vector (LTR-GFP-IRES-Tat) and show that stochastic fluctuations in Tat influence the viral latency decision. Low GFP/Tat expression was found to generate bifurcating phenotypes with clonal populations derived from single proviral integrations simultaneously exhibiting very high and near zero GFP expression. Although phenotypic bifurcation (PheB) was correlated with distinct genomic integration patterns, neither these patterns nor other extrinsic cellular factors (cell cycle/size, aneuploidy, chromatin silencing, etc.) explained PheB. Stochastic computational modeling successfully accounted for PheB and correctly predicted the dynamics of a Tat mutant that were subsequently confirmed by experiment. Thus, Tat stochastics appear sufficient to generate PheB (and potentially proviral latency), illustrating the importance of stochastic fluctuations in gene expression in a mammalian system.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Standard biological parts, such as BioBricks parts, provide the foundation for a new engineering discipline that enables the design and construction of synthetic biological systems with a variety of ...applications in bioenergy, new materials, therapeutics, and environmental remediation. Although the original BioBricks assembly standard has found widespread use, it has several shortcomings that limit its range of potential applications. In particular, the system is not suitable for the construction of protein fusions due to an unfavorable scar sequence that encodes an in-frame stop codon.
Here, we present a similar but new composition standard, called BglBricks, that addresses the scar translation issue associated with the original standard. The new system employs BglII and BamHI restriction enzymes, robust cutters with an extensive history of use, and results in a 6-nucleotide scar sequence encoding glycine-serine, an innocuous peptide linker in most protein fusion applications. We demonstrate the utility of the new standard in three distinct applications, including the construction of constitutively active gene expression devices with a wide range of expression profiles, the construction of chimeric, multi-domain protein fusions, and the targeted integration of functional DNA sequences into specific loci of the E. coli genome.
The BglBrick standard provides a new, more flexible platform from which to generate standard biological parts and automate DNA assembly. Work on BglBrick assembly reactions, as well as on the development of automation and bioinformatics tools, is currently underway. These tools will provide a foundation from which to transform genetic engineering from a technically intensive art into a purely design-based discipline.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Abstract
For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the ...biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org/biochem and KBase.
Mammalian gene expression patterns, and their variability across populations of cells, are regulated by factors specific to each gene in concert with its surrounding cellular and genomic environment. ...Lentiviruses such as HIV integrate their genomes into semi-random genomic locations in the cells they infect, and the resulting viral gene expression provides a natural system to dissect the contributions of genomic environment to transcriptional regulation. Previously, we showed that expression heterogeneity and its modulation by specific host factors at HIV integration sites are key determinants of infected-cell fate and a possible source of latent infections. Here, we assess the integration context dependence of expression heterogeneity from diverse single integrations of a HIV-promoter/GFP-reporter cassette in Jurkat T-cells. Systematically fitting a stochastic model of gene expression to our data reveals an underlying transcriptional dynamic, by which multiple transcripts are produced during short, infrequent bursts, that quantitatively accounts for the wide, highly skewed protein expression distributions observed in each of our clonal cell populations. Interestingly, we find that the size of transcriptional bursts is the primary systematic covariate over integration sites, varying from a few to tens of transcripts across integration sites, and correlating well with mean expression. In contrast, burst frequencies are scattered about a typical value of several per cell-division time and demonstrate little correlation with the clonal means. This pattern of modulation generates consistently noisy distributions over the sampled integration positions, with large expression variability relative to the mean maintained even for the most productive integrations, and could contribute to specifying heterogeneous, integration-site-dependent viral production patterns in HIV-infected cells. Genomic environment thus emerges as a significant control parameter for gene expression variation that may contribute to structuring mammalian genomes, as well as be exploited for survival by integrating viruses.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To study how a bacterium allocates its resources, we compared the costs and benefits of most (86%) of the proteins in Escherichia coli K-12 during growth in minimal glucose medium. The cost or ...investment in each protein was estimated from ribosomal profiling data, and the benefit of each protein was measured by assaying a library of transposon mutants. We found that proteins that are important for fitness are usually highly expressed, and 95% of these proteins are expressed at above 13 parts per million (ppm). Conversely, proteins that do not measurably benefit the host (with a benefit of less than 5% per generation) tend to be weakly expressed, with a median expression of 13 ppm. In aggregate, genes with no detectable benefit account for 31% of protein production, or about 22% if we correct for genetic redundancy. Although some of the apparently unnecessary expression could have subtle benefits in minimal glucose medium, the majority of the burden is due to genes that are important in other conditions. We propose that at least 13% of the cell's protein is "on standby" in case conditions change.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Synthetic non-coding RNAs have emerged as a versatile class of molecular devices that have a diverse range of programmable functions, including signal sensing, gene regulation and the modulation of ...molecular interactions. Owing to their small size and the central role of Watson-Crick base pairing in determining their structure, function and interactions, several distinct types of synthetic non-coding RNA regulators that are functional at the DNA, mRNA and protein levels have been experimentally characterized and computationally modelled. These engineered devices can be incorporated into genetic circuits, enabling the more efficient creation of complex synthetic biological systems. In this Review, we summarize recent progress in engineering synthetic non-coding RNA devices and their application to genetic and cellular engineering in a broad range of microorganisms.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Mineralization of organic matter in anoxic environments relies on the cooperative activities of hydrogen producers and consumers linked by interspecies electron transfer in syntrophic consortia that ...may include sulfate-reducing species (e.g., Desulfovibrio). Physiological differences and various gene repertoires implicated in syntrophic metabolism among Desulfovibrio species suggest considerable variation in the biochemical basis of syntrophy. In this study, comparative transcriptional and mutant analyses of Desulfovibrio alaskensis strain G20 and Desulfovibrio vulgaris strain Hildenborough growing syntrophically with Methanococcus maripaludis on lactate were used to develop new and revised models for their alternative electron transfer and energy conservation systems. Lactate oxidation by strain G20 generates a reduced thiol-disulfide redox pair(s) and ferredoxin that are energetically coupled to H+/CO2 reduction by periplasmic formate dehydrogenase and hydrogenase via a flavin-based reverse electron bifurcation process (electron confurcation) and a menaquinone (MQ) redox loop-mediated reverse electron flow involving the membrane-bound Qmo and Qrc complexes. In contrast, strain Hildenborough uses a larger number of cytoplasmic and periplasmic proteins linked in three intertwining pathways to couple H+ reduction to lactate oxidation. The faster growth of strain G20 in coculture is associated with a kinetic advantage conferred by the Qmo-MQ-Qrc loop as an electron transfer system that permits higher lactate oxidation rates under elevated hydrogen levels (thereby enhancing methanogenic growth) and use of formate as the main electron-exchange mediator (>70% electron flux), as opposed to the primarily hydrogen-based exchange by strain Hildenborough. This study further demonstrates the absence of a conserved gene core in Desulfovibrio that would determine the ability for a syntrophic lifestyle.
The range over which a protein is expressed, and its cell-to-cell variability, is often thought to be linked to the demand for its activity. Steady-state protein level is determined by multiple ...mechanisms controlling transcription and translation, many of which are limited by DNA- and RNA-encoded signals that affect initiation, elongation and termination of polymerases and ribosomes. We performed a comprehensive analysis of >100 sequence features to derive a predictive model composed of a minimal non-redundant set of factors explaining 66% of the total variation of protein abundance observed in >800 genes in Escherichia coli. The model suggests that protein abundance is primarily determined by the transcript level (53%) and by effectors of translation elongation (12%), whereas only a small fraction of the variation is explained by translational initiation (1%). Our analyses uncover a new sequence determinant, not previously described, affecting translation initiation and suggest that elongation rate is affected by both codon biases and specific amino acid composition. We also show that transcription and translation efficiency may have an effect on expression noise, which is more similar than previously assumed.