Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of ...this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.
With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
Ten simple rules for getting and giving credit for data Wood-Charlson, Elisha M; Crockett, Zachary; Erdmann, Chris ...
PLOS computational biology/PLoS computational biology,
09/2022, Letnik:
18, Številka:
9
Journal Article
Recenzirano
Odprti dostop
At a more granular level, each research domain will have different requirements for describing the context around data collection and preparation (important for determining whether data sets can be ...combined or compared) and steps used for data processing (important for ensuring reproducibility). ...the US Geological Survey (USGS) has great resources on data management, including standards and metadata reporting 21. Here, we describe an example of the different layers of metadata (Fig 1 and Table 1) recommended to make data derived from physical samples collected from the environment (e.g., soil or water), processed in the laboratory (e.g., DNA sequencing), and analyzed using bioinformatic tools (e.g., genome assembly and taxonomic assignment) FAIR, comparable, and reproducible. Major funders like the NSF, NIH, and the US Department of Energy (DOE) already do.
Stochastic effects in biomolecular systems have now been recognized as a major physiologically and evolutionarily important factor in the development and function of many living organisms. ...Nevertheless, they are often thought of as providing only moderate refinements to the behaviors otherwise predicted by the classical deterministic system description. In this work we show by using both analytical and numerical investigation that at least in one ubiquitous class of (bio)chemical-reaction mechanisms, enzymatic futile cycles, the external noise may induce a bistable oscillatory (dynamic switching) behavior that is both quantitatively and qualitatively different from what is predicted or possible deterministically. We further demonstrate that the noise required to produce these distinct properties can itself be caused by a set of auxiliary chemical reactions, making it feasible for biological systems of sufficient complexity to generate such behavior internally. This new stochastic dynamics then serves to confer additional functional modalities on the enzymatic futile cycle mechanism that include stochastic amplification and signaling, the characteristics of which could be controlled by both the type and parameters of the driving noise. Hence, such noise-induced phenomena may, among other roles, potentially offer a novel type of control mechanism in pathways that contain these cycles and the like units. In particular, observations of endogenous or externally driven noise-induced dynamics in regulatory networks may thus provide additional insight into their topology, structure, and kinetics.
The practice of engineering biology now depends on the ad hoc reuse of genetic elements whose precise activities vary across changing contexts. Methods are lacking for researchers to affordably ...coordinate the quantification and analysis of part performance across varied environments, as needed to identify, evaluate and improve problematic part types. We developed an easy-to-use analysis of variance (ANOVA) framework for quantifying the performance of genetic elements. For proof of concept, we assembled and analyzed combinations of prokaryotic transcription and translation initiation elements in Escherichia coli. We determined how estimation of part activity relates to the number of unique element combinations tested, and we show how to estimate expected ensemble-wide part activity from just one or two measurements. We propose a new statistic, biomolecular part 'quality', for tracking quantitative variation in part performance across changing contexts.
Bacteria can sense their environment, distinguish between cell types, and deliver proteins to eukaryotic cells. Here, we engineer the interaction between bacteria and cancer cells to depend on ...heterologous environmental signals. We have characterized invasin from
Yersinia pseudotuburculosis as an output module that enables
Escherichia coli to invade cancer-derived cells, including HeLa, HepG2, and U2OS lines. To environmentally restrict invasion, we placed this module under the control of heterologous sensors. With the
Vibrio fischeri lux quorum sensing circuit, the hypoxia-responsive
fdhF promoter, or the arabinose-inducible
araBAD promoter, the bacteria invade cells at densities greater than 10
8
bacteria/ml, after growth in an anaerobic growth chamber or in the presence of 0.02% arabinose, respectively. In the process, we developed a technique to tune the linkage between a sensor and output gene using ribosome binding site libraries and genetic selection. This approach could be used to engineer bacteria to sense the microenvironment of a tumor and respond by invading cancerous cells and releasing a cytotoxic agent.
HIV-1 Tat transactivation is vital for completion of the viral life cycle and has been implicated in determining proviral latency. We present an extensive experimental/computational study of an HIV-1 ...model vector (LTR-GFP-IRES-Tat) and show that stochastic fluctuations in Tat influence the viral latency decision. Low GFP/Tat expression was found to generate bifurcating phenotypes with clonal populations derived from single proviral integrations simultaneously exhibiting very high and near zero GFP expression. Although phenotypic bifurcation (PheB) was correlated with distinct genomic integration patterns, neither these patterns nor other extrinsic cellular factors (cell cycle/size, aneuploidy, chromatin silencing, etc.) explained PheB. Stochastic computational modeling successfully accounted for PheB and correctly predicted the dynamics of a Tat mutant that were subsequently confirmed by experiment. Thus, Tat stochastics appear sufficient to generate PheB (and potentially proviral latency), illustrating the importance of stochastic fluctuations in gene expression in a mammalian system.
Phages are one of the key ecological drivers of microbial community dynamics, function, and evolution. Despite their importance in bacterial ecology and evolutionary processes, phage genes are poorly ...characterized, hampering their usage in a variety of biotechnological applications. Methods to characterize such genes, even those critical to the phage life cycle, are labor intensive and are generally phage specific. Here, we develop a systematic gene essentiality mapping method scalable to new phage-host combinations that facilitate the identification of nonessential genes. As a proof of concept, we use an arrayed genome-wide CRISPR interference (CRISPRi) assay to map gene essentiality landscape in the canonical coliphages λ and P1. Results from a single panel of CRISPRi probes largely recapitulate the essential gene roster determined from decades of genetic analysis for lambda and provide new insights into essential and nonessential loci in P1. We present evidence of how CRISPRi polarity can lead to false positive gene essentiality assignments and recommend caution towards interpreting CRISPRi data on gene essentiality when applied to less studied phages. Finally, we show that we can engineer phages by inserting DNA barcodes into newly identified inessential regions, which will empower processes of identification, quantification, and tracking of phages in diverse applications.
Synthetic circuits embedded in host cells compete with cellular processes for limited intracellular resources. Here we show how funnelling of cellular resources, after global transcriptome ...degradation by the sequence-dependent endoribonuclease MazF, to a synthetic circuit can increase production. Target genes are protected from MazF activity by recoding the gene sequence to eliminate recognition sites, while preserving the amino acid sequence. The expression of a protected fluorescent reporter and flux of a high-value metabolite are significantly enhanced using this genome-scale control strategy. Proteomics measurements discover a host factor in need of protection to improve resource redistribution activity. A computational model demonstrates that the MazF mRNA-decay feedback loop enables proportional control of MazF in an optimal operating regime. Transcriptional profiling of MazF-induced cells elucidates the dynamic shifts in transcript abundance and discovers regulatory design elements. Altogether, our results suggest that manipulation of cellular resource allocation is a key control parameter for synthetic circuit design.
A decade ago, seminal perspectives and papers set a strong vision for the field of systems biology, and a number of these themes have flourished. Here, we describe key technologies and insights that ...have elucidated the evolution, architecture, and function of cellular networks, ultimately leading to the first predictive genome-scale regulatory and metabolic models of organisms. Can systems approaches bridge the gap between correlative analysis and mechanistic insights?
The basidiomycete yeast
(also known as
) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid ...droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded
T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted function in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. These results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.