Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. ...Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets.
Here we explore the role of recombination in both maintaining and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations.
These findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.
While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing ...attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200-900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.
The yeast Dekkera/Brettanomyces bruxellensis can cause enormous economic losses in wine industry due to production of phenolic off-flavor compounds. D. bruxellensis is a distant relative of baker's ...yeast Saccharomyces cerevisiae. Nevertheless, these two yeasts are often found in the same habitats and share several food-related traits, such as production of high ethanol levels and ability to grow without oxygen. In some food products, like lambic beer, D. bruxellensis can importantly contribute to flavor development. We determined the 13.4Mb genome sequence of the D. bruxellensis strain Y879 (CBS2499) and deduced the genetic background of several “food-relevant” properties and evolutionary history of this yeast. Surprisingly, we find that this yeast is phylogenetically distant to other food-related yeasts and most related to Pichia (Komagataella) pastoris, which is an aerobic poor ethanol producer. We further show that the D. bruxellensis genome does not contain an excess of lineage specific duplicated genes nor a horizontally transferred URA1 gene, two crucial events that promoted the evolution of the food relevant traits in the S. cerevisiae lineage. However, D. bruxellensis has several independently duplicated ADH and ADH-like genes, which are likely responsible for metabolism of alcohols, including ethanol, and also a range of aromatic compounds.
► Genome sequence of an important wine spoiling yeast ► Genomics of food related microorganisms ► Genetic background for aromatic compounds in wine ► Comparative genomics reveals evolutionary strategies.
Metagenomic sequence data from defined mock communities is crucial for the assessment of sequencing platform performance and downstream analyses, including assembly, binning and taxonomic assignment. ...We report a comparison of shotgun metagenome sequencing and assembly metrics of a defined microbial mock community using the Oxford Nanopore Technologies (ONT) MinION, PacBio and Illumina sequencing platforms. Our synthetic microbial community BMock12 consists of 12 bacterial strains with genome sizes spanning 3.2-7.2 Mbp, 40-73% GC content, and 1.5-7.3% repeats. Size selection of both PacBio and ONT sequencing libraries prior to sequencing was essential to yield comparable relative abundances of organisms among all sequencing technologies. While the Illumina-based metagenome assembly yielded good coverage with few misassemblies, contiguity was greatly improved by both, Illumina + ONT and Illumina + PacBio hybrid assemblies but increased misassemblies, most notably in genomes with high sequence similarity to each other. Our resulting datasets allow evaluation and benchmarking of bioinformatics software on Illumina, PacBio and ONT platforms in parallel.
DOE JGI Metagenome Workflow Clum, Alicia; Huntemann, Marcel; Bushnell, Brian ...
MSystems,
05/2021, Letnik:
6, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data ...sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751-D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723-D733, 2021, https://doi.org/10.1093/nar/gkaa983).
The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner.
Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly ...and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.
The North American prairie covered about 3.6 million-km
of the continent prior to European contact. Only 1-2% of the original prairie remains, but the soils that developed under these prairies are ...some of the most productive and fertile in the world, containing over 35% of the soil carbon in the continental United States. Cultivation may alter microbial diversity and composition, influencing the metabolism of carbon, nitrogen, and other elements. Here, we explored the structure and functional potential of the soil microbiome in paired cultivated-corn (at the time of sampling) and never-cultivated native prairie soils across a three-states transect (Wisconsin, Iowa, and Kansas) using metagenomic and 16S rRNA gene sequencing and lipid analysis. At the Wisconsin site, we also sampled adjacent restored prairie and switchgrass plots. We found that agricultural practices drove differences in community composition and diversity across the transect. Microbial biomass in prairie samples was twice that of cultivated soils, but alpha diversity was higher with cultivation. Metagenome analyses revealed denitrification and starch degradation genes were abundant across all soils, as were core genes involved in response to osmotic stress, resource transport, and environmental sensing. Together, these data indicate that cultivation shifted the microbiome in consistent ways across different regions of the prairie, but also suggest that many functions are resilient to changes caused by land management practices - perhaps reflecting adaptations to conditions common to tallgrass prairie soils in the region (e.g., soil type, parent material, development under grasses, temperature and rainfall patterns, and annual freeze-thaw cycles). These findings are important for understanding the long-term consequences of land management practices to prairie soil microbial communities and their genetic potential to carry out key functions.
The complete genomic sequence of Pseudomonas syringae pv. syringae B728a (Pss B728a) has been determined and is compared with that of P. syringae pv. tomato DC3000 (Pst DC3000). The two pathovars of ...this economically important species of plant pathogenic bacteria differ in host range and other interactions with plants, with Pss having a more pronounced epiphytic stage of growth and higher abiotic stress tolerance and Pst DC3000 having a more pronounced apoplastic growth habitat. The Pss B728a genome (6.1 Mb) contains a circular chromosome and no plasmid, whereas the Pst DC3000 genome is 6.5 mbp in size, composed of a circular chromosome and two plasmids. Although a high degree of similarity exists between the two sequenced Pseudomonads, 976 protein-encoding genes are unique to Pss B728a when compared with Pst DC3000, including large genomic islands likely to contribute to virulence and host specificity. Over 375 repetitive extragenic palindromic sequences unique to Pss B728a when compared with Pst DC3000 are widely distributed throughout the chromosome except in 14 genomic islands, which generally had lower GC content than the genome as a whole. Content of the genomic islands varies, with one containing a prophage and another the plasmid pKLC102 of Pseudomonas aeruginosa PAO1. Among the 976 genes of Pss B728a with no counterpart in Pst DC3000 are those encoding for syringopeptin, syringomycin, indole acetic acid biosynthesis, arginine degradation, and production of ice nuclei. The genomic comparison suggests that several unique genes for Pss B728a such as ectoine synthase, DNA repair, and antibiotic production may contribute to the epiphytic fitness and stress tolerance of this organism.