In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo ...annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.
Abstract
Motivation
Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their ...underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path.
Results
We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes.
Availability and implementation
https://github.com/deprekate/PHANOTATE
Supplementary information
Supplementary data are available at Bioinformatics online.
Interleukin-22 (IL-22) is highly induced in response to infections with a variety of pathogens, and its main functions are considered to be tissue repair and host defense at mucosal surfaces. Here we ...showed that IL-22 has a unique role during infection in that its expression suppressed the intestinal microbiota and enhanced the colonization of a pathogen. IL-22 induced the expression of antimicrobial proteins, including lipocalin-2 and calprotectin, which sequester essential metal ions from microbes. Because Salmonella enterica ser. Typhimurium can overcome metal ion starvation mediated by lipocalin-2 and calprotectin via alternative pathways, IL-22 boosted its colonization of the inflamed intestine by suppressing commensal Enterobacteriaceae, which are susceptible to the antimicrobial proteins. Thus, IL-22 tipped the balance between pathogenic and commensal bacteria in favor of a pathogen. Taken together, IL-22 induction can be exploited by pathogens to suppress the growth of their closest competitors, thereby enhancing pathogen colonization of mucosal surfaces.
Display omitted
•IL-22 does not prevent Salmonella dissemination•IL-22 enhances the growth of Salmonella in the inflamed gut•IL-22 controls the growth of commensal E. coli, Salmonella’s closest competitor•IL-22-induced antimicrobial proteins enhance Salmonella competition with E. coli
Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus ...discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications.
New viruses infecting bacteria are increasingly being discovered in many environments through sequence-based explorations. To understand their role in microbial ecosystems, computational tools are indispensable to prioritize and guide experimental efforts. This review assesses and discusses a range of bioinformatic approaches to predict bacteriophage–host relationships when all that is known is their genome sequence.
Graphical Abstract Figure.
New viruses infecting bacteria are increasingly being discovered in many environments through sequence-based explorations. To understand their role in microbial ecosystems, computational tools are indispensable to prioritize and guide experimental efforts. This review assesses and discusses a range of bioinformatic approaches to predict bacteriophage–host relationships when all that is known is their genome sequence.
The Enterobacteriaceae are a family of Gram-negative bacteria that include commensal organisms as well as primary and opportunistic pathogens that are among the leading causes of morbidity and ...mortality worldwide. Although Enterobacteriaceae often comprise less than 1% of a healthy intestine's microbiota, some of these organisms can bloom in the inflamed gut; expansion of enterobacteria is a hallmark of microbial imbalance known as dysbiosis. Microcins are small secreted proteins that possess antimicrobial activity in vitro, but whose role in vivo has been unclear. Here we demonstrate that microcins enable the probiotic bacterium Escherichia coli Nissle 1917 (EcN) to limit the expansion of competing Enterobacteriaceae (including pathogens and pathobionts) during intestinal inflammation. Microcin-producing EcN limits the growth of competitors in the inflamed intestine, including commensal E. coli, adherent-invasive E. coli and the related pathogen Salmonella enterica. Moreover, only therapeutic administration of the wild-type, microcin-producing EcN to mice previously infected with S. enterica substantially reduced intestinal colonization by the pathogen. Our work provides the first evidence that microcins mediate inter- and intraspecies competition among the Enterobacteriaceae in the inflamed gut. Moreover, we show that microcins can act as narrow-spectrum therapeutics to inhibit enteric pathogens and reduce enterobacterial blooms.
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for ...identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
Anaphylaxis to vaccines is historically a rare event. The coronavirus disease 2019 pandemic drove the need for rapid vaccine production applying a novel antigen delivery system: messenger RNA ...vaccines packaged in lipid nanoparticles. Unexpectedly, public vaccine administration led to a small number of severe allergic reactions, with resultant substantial public concern, especially within atopic individuals. We reviewed the constituents of the messenger RNA lipid nanoparticle vaccine and considered several contributors to these reactions: (1) contact system activation by nucleic acid, (2) complement recognition of the vaccine-activating allergic effector cells, (3) preexisting antibody recognition of polyethylene glycol, a lipid nanoparticle surface hydrophilic polymer, and (4) direct mast cell activation, coupled with potential genetic or environmental predispositions to hypersensitivity. Unfortunately, measurement of anti–polyethylene glycol antibodies in vitro is not clinically available, and the predictive value of skin testing to polyethylene glycol components as a coronavirus disease 2019 messenger RNA vaccine-specific anaphylaxis marker is unknown. Even less is known regarding the applicability of vaccine use for testing (in vitro/vivo) to ascertain pathogenesis or predict reactivity risk. Expedient and thorough research-based evaluation of patients who have suffered anaphylactic vaccine reactions and prospective clinical trials in putative at-risk individuals are needed to address these concerns during a public health crisis.
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial ...genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.