Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for ...identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
Microbial communities play a major role in disease, biogeochemical cycling, agriculture, and bioremediation. However, identifying the ecological processes that govern microbial community assembly and ...disentangling the relative impacts of those processes has proven challenging. Here, we propose that this discord is due to microbial systems being studied at different spatial, temporal, and phylogenetic scales. We argue that different processes dominate at different scales, and that through a more explicit consideration of spatial, temporal, and phylogenetic grains and extents (the two components of scale) a more accurate, clear, and useful understanding of microbial community assembly can be developed. We demonstrate the value of applying ecological concepts of scale to microbiology, specifically examining their application to nestedness, legacy effects, and taxa–area relationships of microbial systems. These proposed considerations of scale will help resolve long-standing debates in microbial ecology regarding the processes determining the assembly of microbial communities, and provide organizing principles around which hypotheses and theories can be developed.
Understanding the processes that shape microbial communities holds potential to provide important insights into ecology and evolutionary biology, and can enable forecasting and management of microbial ecosystem services.At least four fundamental processes (selection, dispersal limitation, neutral processes, mutation) may shape microbial communities, but determining their importance has proven challenging.Ecology has a long history of recognizing that numerous patterns and processes are dependent on spatial, temporal, and phylogenetic scales. Each scale is comprised of two fundamental components: grain and extent.Recognizing that different processes may dominate at different scales in microbial systems could be instrumental in resolving long-standing uncertainty about which processes are important in shaping microbial communities.
Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the ...rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B
, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents ∼75% of the genus-level bacterial and archaeal taxa present in the rumen.
Many Archaea produce membrane‐spanning lipids that enable life in extreme environments. These isoprenoid glycerol dibiphytanyl glycerol tetraethers (GDGTs) may contain up to eight cyclopentyl and one ...cyclohexyl ring, where higher degrees of cyclization are associated with more acidic, hotter or energy‐limited conditions. Recently, the genes encoding GDGT ring synthases, grsAB, were identified in two Sulfolobaceae; however, the distribution and abundance of grs homologs across environments inhabited by these and related organisms remain a mystery. To address this, we examined the distribution of grs homologs in relation to environmental temperature and pH, from thermal springs across Earth, where sequences derive from metagenomes, metatranscriptomes, single‐cell and cultivar genomes. The abundance of grs homologs shows a strong negative correlation to pH, but a weak positive correlation to temperature. Archaeal genomes and metagenome‐assembled genomes (MAGs) that carry two or more grs copies are more abundant in low pH springs. We also find grs in 12 archaeal classes, with the most representatives in Thermoproteia, followed by MAGs of the uncultured Korarchaeia, Bathyarchaeia and Hadarchaeia, while several Nitrososphaeria encodes >3 copies. Our findings highlight the key role of grs‐catalysed lipid cyclization in archaeal diversification across hot and acidic environments.
Abstract
The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) contains annotated isolate genome and metagenome datasets sequenced at the DOE’s Joint Genome ...Institute (JGI), submitted by external users, or imported from public sources such as NCBI. IMG v 6.0 includes advanced search functions and a new tool for statistical analysis of mixed sets of genomes and metagenome bins. The new IMG web user interface also has a new Help page with additional documentation and webinar tutorials to help users better understand how to use various IMG functions and tools for their research. New datasets have been processed with the prokaryotic annotation pipeline v.5, which includes extended protein family assignments.
Viruses are the most abundant biological entities on Earth, but challenges in detecting, isolating, and classifying unknown viruses have prevented exhaustive surveys of the global virome. Here we ...analysed over 5 Tb of metagenomic sequence data from 3,042 geographically diverse samples to assess the global distribution, phylogenetic diversity, and host specificity of viruses. We discovered over 125,000 partial DNA viral genomes, including the largest phage yet identified, and increased the number of known viral genes by 16-fold. Half of the predicted partial viral genomes were clustered into genetically distinct groups, most of which included genes unrelated to those in known viruses. Using CRISPR spacers and transfer RNA matches to link viral groups to microbial host(s), we doubled the number of microbial phyla known to be infected by viruses, and identified viruses that can infect organisms from different phyla. Analysis of viral distribution across diverse ecosystems revealed strong habitat-type specificity for the vast majority of viruses, but also identified some cosmopolitan groups. Our results highlight an extensive global viral diversity and provide detailed insight into viral habitat distribution and host–virus interactions.
Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV ...genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysis of 58,023 major capsid proteins from large and giant viruses using metagenomic data revealed the global distribution patterns and cosmopolitan nature of these viruses. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, indicating that host reprogramming is probably a common strategy in the NCLDVs. Furthermore, inferences of horizontal gene transfer connected viral lineages to diverse eukaryotic hosts. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses-which are associated with most major eukaryotic lineages-as important players in ecosystems across Earth's biomes.
Abstract
Viruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive ...global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.