We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, ...Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
The roots of Arabidopsis thaliana host diverse fungal communities that affect plant health and disease states. Here, we sequence the genomes of 41 fungal isolates representative of the A. thaliana ...root mycobiota for comparative analysis with other 79 plant-associated fungi. Our analyses indicate that root mycobiota members evolved from ancestors with diverse lifestyles and retain large repertoires of plant cell wall-degrading enzymes (PCWDEs) and effector-like small secreted proteins. We identify a set of 84 gene families associated with endophytism, including genes encoding PCWDEs acting on xylan (family GH10) and cellulose (family AA9). Transcripts encoding these enzymes are also part of a conserved transcriptional program activated by phylogenetically-distant mycobiota members upon host contact. Recolonization experiments with individual fungi indicate that strains with detrimental effects in mono-association with the host colonize roots more aggressively than those with beneficial activities, and dominate in natural root samples. Furthermore, we show that the pectin-degrading enzyme family PL1_7 links aggressiveness of endophytic colonization to plant health.
• Endogonales (Mucoromycotina), composed of Endogonaceae and Densosporaceae, is the only known non-Dikarya order with ectomycorrhizal members. They also form mycorrhizallike association with some ...nonspermatophyte plants. It has been recently proposed that Endogonales were among the earliest mycorrhizal partners with land plants. It remains unknown whether Endogonales possess genomes with mycorrhizal-lifestyle signatures and whether Endogonales originated around the same time as land plants did.
• We sampled sporocarp tissue from four Endogonaceae collections and performed shotgun genome sequencing. After binning the metagenome data, we assembled and annotated the Endogonaceae genomes. We performed comparative analysis on plant-cell-wall-degrading enzymes (PCWDEs) and small secreted proteins (SSPs). We inferred phylogenetic placement of Endogonaceae and estimated the ages of Endogonaceae and Endogonales with expanded taxon sampling.
• Endogonaceae have large genomes with high repeat content, low diversity of PCWDEs, but without elevated SSP/secretome ratios. Dating analysis estimated that Endogonaceae originated in the Permian–Triassic boundary and Endogonales originated in the mid–late Silurian. Mycoplasma-related endobacterium sequences were identified in three Endogonaceae genomes.
• Endogonaceae genomes possess typical signatures of mycorrhizal lifestyle. The early origin of Endogonales suggests that the mycorrhizal association between Endogonales and plants might have played an important role during the colonization of land by plants.
The most frequently encountered symbiont on tree roots is the ascomycete Cenococcum geophilum, the only mycorrhizal species within the largest fungal class Dothideomycetes, a class known for ...devastating plant pathogens. Here we show that the symbiotic genomic idiosyncrasies of ectomycorrhizal basidiomycetes are also present in C. geophilum with symbiosis-induced, taxon-specific genes of unknown function and reduced numbers of plant cell wall-degrading enzymes. C. geophilum still holds a significant set of genes in categories known to be involved in pathogenesis and shows an increased genome size due to transposable elements proliferation. Transcript profiling revealed a striking upregulation of membrane transporters, including aquaporin water channels and sugar transporters, and mycorrhiza-induced small secreted proteins (MiSSPs) in ectomycorrhiza compared with free-living mycelium. The frequency with which this symbiont is found on tree roots and its possible role in water and nutrient transport in symbiosis calls for further studies on mechanisms of host and environmental adaptation.
Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample ...(multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.
Eukaryotic phytoplankton are responsible for at least 20% of annual global carbon fixation. Their diversity and activity are shaped by interactions with prokaryotes as part of complex microbiomes. ...Although differences in their local species diversity have been estimated, we still have a limited understanding of environmental conditions responsible for compositional differences between local species communities on a large scale from pole to pole. Here, we show, based on pole-to-pole phytoplankton metatranscriptomes and microbial rDNA sequencing, that environmental differences between polar and non-polar upper oceans most strongly impact the large-scale spatial pattern of biodiversity and gene activity in algal microbiomes. The geographic differentiation of co-occurring microbes in algal microbiomes can be well explained by the latitudinal temperature gradient and associated break points in their beta diversity, with an average breakpoint at 14 °C ± 4.3, separating cold and warm upper oceans. As global warming impacts upper ocean temperatures, we project that break points of beta diversity move markedly pole-wards. Hence, abrupt regime shifts in algal microbiomes could be caused by anthropogenic climate change.
Summary
Ectomycorrhizal fungi play a key role in forests by establishing mutualistic symbioses with woody plants. Genome analyses have identified conserved symbiosis‐related traits among ...ectomycorrhizal fungal species, but the molecular mechanisms underlying host specificity remain poorly known.
We sequenced and compared the genomes of seven species of milk‐cap fungi (Lactarius, Russulales) with contrasting host specificity. We also compared these genomes with those of symbiotic and saprotrophic Russulales species, aiming to identify genes involved in their ecology and host specificity.
The size of Lactarius genomes is significantly larger than other Russulales species, owing to a massive accumulation of transposable elements and duplication of dispensable genes. As expected, their repertoire of genes coding for plant cell wall‐degrading enzymes is restricted, but they retained a substantial set of genes involved in microbial cell wall degradation. Notably, Lactarius species showed a striking expansion of genes encoding proteases, such as secreted ectomycorrhiza‐induced sedolisins. A high copy number of genes coding for small secreted LysM proteins and Lactarius‐specific lectins were detected, which may be linked to host specificity.
This study revealed a large diversity in the genome landscapes and gene repertoires within Russulaceae. The known host specificity of Lactarius symbionts may be related to mycorrhiza‐induced species‐specific genes, including secreted sedolisins.
While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing ...attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200-900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of ...challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.
In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.
These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Succinate is produced petrochemically from maleic anhydride to satisfy a small specialty chemical market. If succinate could be produced fermentatively at a price competitive with that of maleic ...anhydride, though, it could replace maleic anhydride as the precursor of many bulk chemicals, transforming a multi-billion dollar petrochemical market into one based on renewable resources. Actinobacillus succinogenes naturally converts sugars and CO2 into high concentrations of succinic acid as part of a mixed-acid fermentation. Efforts are ongoing to maximize carbon flux to succinate to achieve an industrial process.
Described here is the 2.3 Mb A. succinogenes genome sequence with emphasis on A. succinogenes's potential for genetic engineering, its metabolic attributes and capabilities, and its lack of pathogenicity. The genome sequence contains 1,690 DNA uptake signal sequence repeats and a nearly complete set of natural competence proteins, suggesting that A. succinogenes is capable of natural transformation. A. succinogenes lacks a complete tricarboxylic acid cycle as well as a glyoxylate pathway, and it appears to be able to transport and degrade about twenty different carbohydrates. The genomes of A. succinogenes and its closest known relative, Mannheimia succiniciproducens, were compared for the presence of known Pasteurellaceae virulence factors. Both species appear to lack the virulence traits of toxin production, sialic acid and choline incorporation into lipopolysaccharide, and utilization of hemoglobin and transferrin as iron sources. Perspectives are also given on the conservation of A. succinogenes genomic features in other sequenced Pasteurellaceae.
Both A. succinogenes and M. succiniciproducens genome sequences lack many of the virulence genes used by their pathogenic Pasteurellaceae relatives. The lack of pathogenicity of these two succinogens is an exciting prospect, because comparisons with pathogenic Pasteurellaceae could lead to a better understanding of Pasteurellaceae virulence. The fact that the A. succinogenes genome encodes uptake and degradation pathways for a variety of carbohydrates reflects the variety of carbohydrate substrates available in the rumen, A. succinogenes's natural habitat. It also suggests that many different carbon sources can be used as feedstock for succinate production by A. succinogenes.