Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for ...identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
Abstract
The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) contains annotated isolate genome and metagenome datasets sequenced at the DOE’s Joint Genome ...Institute (JGI), submitted by external users, or imported from public sources such as NCBI. IMG v 6.0 includes advanced search functions and a new tool for statistical analysis of mixed sets of genomes and metagenome bins. The new IMG web user interface also has a new Help page with additional documentation and webinar tutorials to help users better understand how to use various IMG functions and tools for their research. New datasets have been processed with the prokaryotic annotation pipeline v.5, which includes extended protein family assignments.
The genome sequences of many species of the human gut microbiome remain unknown, largely owing to challenges in cultivating microorganisms under laboratory conditions. Here we address this problem by ...reconstructing 60,664 draft prokaryotic genomes from 3,810 faecal metagenomes, from geographically and phenotypically diverse humans. These genomes provide reference points for 2,058 newly identified species-level operational taxonomic units (OTUs), which represents a 50% increase over the previously known phylogenetic diversity of sequenced gut bacteria. On average, the newly identified OTUs comprise 33% of richness and 28% of species abundance per individual, and are enriched in humans from rural populations. A meta-analysis of clinical gut-microbiome studies pinpointed numerous disease associations for the newly identified OTUs, which have the potential to improve predictive models. Finally, our analysis revealed that uncultured gut species have undergone genome reduction that has resulted in the loss of certain biosynthetic pathways, which may offer clues for improving cultivation strategies in the future.
Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome ...(UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
Termites effectively feed on many types of lignocellulose assisted by their gut microbial symbionts. To better understand the microbial decomposition of biomass with varied chemical profiles, it is ...important to determine whether termites harbor different microbial symbionts with specialized functionalities geared toward different feeding regimens. In this study, we compared the microbiota in the hindgut paunch of Amitermes wheeleri collected from cow dung and Nasutitermes corniger feeding on sound wood by 16S rRNA pyrotag, comparative metagenomic and metatranscriptomic analyses. We found that Firmicutes and Spirochaetes were the most abundant phyla in A. wheeleri, in contrast to N. corniger where Spirochaetes and Fibrobacteres dominated. Despite this community divergence, a convergence was observed for functions essential to termite biology including hydrolytic enzymes, homoacetogenesis and cell motility and chemotaxis. Overrepresented functions in A. wheeleri relative to N. corniger microbiota included hemicellulose breakdown and fixed-nitrogen utilization. By contrast, glycoside hydrolases attacking celluloses and nitrogen fixation genes were overrepresented in N. corniger microbiota. These observations are consistent with dietary differences in carbohydrate composition and nutrient contents, but may also reflect the phylogenetic difference between the hosts.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 ...to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity.
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction ...guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.
The application of phylogenetic taxonomic procedures led to improvements in the classification of bacteria assigned to the phylum
but even so there remains a need to further clarify relationships ...within a taxon that encompasses organisms of agricultural, biotechnological, clinical, and ecological importance. Classification of the morphologically diverse bacteria belonging to this large phylum based on a limited number of features has proved to be difficult, not least when taxonomic decisions rested heavily on interpretation of poorly resolved 16S rRNA gene trees. Here, draft genome sequences of a large collection of actinobacterial type strains were used to infer phylogenetic trees from genome-scale data using principles drawn from phylogenetic systematics. The majority of taxa were found to be monophyletic but several orders, families, and genera, as well as many species and a few subspecies were shown to be in need of revision leading to proposals for the recognition of 2 orders, 10 families, and 17 genera, as well as the transfer of over 100 species to other genera. In addition, emended descriptions are given for many species mainly involving the addition of data on genome size and DNA G+C content, the former can be considered to be a valuable taxonomic marker in actinobacterial systematics. Many of the incongruities detected when the results of the present study were compared with existing classifications had been recognized from 16S rRNA gene trees though whole-genome phylogenies proved to be much better resolved. The few significant incongruities found between 16S/23S rRNA and whole genome trees underline the pitfalls inherent in phylogenies based upon single gene sequences. Similarly good congruence was found between the discontinuous distribution of phenotypic properties and taxa delineated in the phylogenetic trees though diverse non-monophyletic taxa appeared to be based on the use of plesiomorphic character states as diagnostic features.