Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit ...the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.
We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.
The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a ...resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure.
Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds.
Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Do bacterial taxa demonstrate clear endemism, like macroorgan-¡ sms, or can one site's bacterial community recapture the total phylogenetic diversity of the world's oceans? Here we compare a deep ...bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Census of Marine Microbes (ICoMM) taken from around the globe (ranging from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.7-66.2% of operational taxonomic units identified in a given ICoMM biome. Extrapolation of this overlap suggests that 1.93 × 10¹¹ sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously undetected, persistent microbial seed bank.
Here we describe, the longest microbial time-series analyzed to date using high-resolution 16S rRNA tag pyrosequencing of samples taken monthly over 6 years at a temperate marine coastal site off ...Plymouth, UK. Data treatment effected the estimation of community richness over a 6-year period, whereby 8794 operational taxonomic units (OTUs) were identified using single-linkage preclustering and 21 130 OTUs were identified by denoising the data. The Alphaproteobacteria were the most abundant Class, and the most frequently recorded OTUs were members of the Rickettsiales (SAR 11) and Rhodobacteriales. This near-surface ocean bacterial community showed strong repeatable seasonal patterns, which were defined by winter peaks in diversity across all years. Environmental variables explained far more variation in seasonally predictable bacteria than did data on protists or metazoan biomass. Change in day length alone explains >65% of the variance in community diversity. The results suggested that seasonal changes in environmental variables are more important than trophic interactions. Interestingly, microbial association network analysis showed that correlations in abundance were stronger within bacterial taxa rather than between bacteria and eukaryotes, or between bacteria and environmental variables.
Lotic ecosystems such as rivers and streams are unique in that they represent a continuum of both space and time during the transition from headwaters to the river mouth. As microbes have very ...different controls over their ecology, distribution and dispersion compared with macrobiota, we wished to explore biogeographical patterns within a river catchment and uncover the major drivers structuring bacterioplankton communities. Water samples collected across the River Thames Basin, UK, covering the transition from headwater tributaries to the lower reaches of the main river channel were characterised using 16S rRNA gene pyrosequencing. This approach revealed an ecological succession in the bacterial community composition along the river continuum, moving from a community dominated by Bacteroidetes in the headwaters to Actinobacteria-dominated downstream. Location of the sampling point in the river network (measured as the cumulative water channel distance upstream) was found to be the most predictive spatial feature; inferring that ecological processes pertaining to temporal community succession are of prime importance in driving the assemblages of riverine bacterioplankton communities. A decrease in bacterial activity rates and an increase in the abundance of low nucleic acid bacteria relative to high nucleic acid bacteria were found to correspond with these downstream changes in community structure, suggesting corresponding functional changes. Our findings show that bacterial communities across the Thames basin exhibit an ecological succession along the river continuum, and that this is primarily driven by water residence time rather than the physico-chemical status of the river.
Robust seasonal dynamics in microbial community composition have previously been observed in the English Channel L4 marine observatory. These could be explained either by seasonal changes in the taxa ...present at the L4 site, or by the continuous modulation of abundance of taxa within a persistent microbial community. To test these competing hypotheses, deep sequencing of 16S rRNA from one randomly selected time point to a depth of 10,729,927 reads was compared with an existing taxonomic survey data covering 6 years. When compared against the 6-year survey of 72 shallow sequenced time points, the deep sequenced time point maintained 95.4% of the combined shallow OTUs. Additionally, on average, 99.75%±0.06 (mean±s.d.) of the operational taxonomic units found in each shallow sequenced sample were also found in the single deep sequenced sample. This suggests that the vast majority of taxa identified in this ecosystem are always present, but just in different proportions that are predictable. Thus observed changes in community composition are actually variations in the relative abundance of taxa, not, as was previously believed, demonstrating extinction and recolonization of taxa in the ecosystem through time.
Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, ...metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities.
We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, approximately 91% of which were novel.
This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities--if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.
Summary
Very few marine microbial communities are well characterized even with the weight of research effort presently devoted to it. Only a small proportion of this effort has been aimed at ...investigating temporal community structure. Here we present the first report of the application of high‐throughput pyrosequencing to investigate intra‐annual bacterial community structure. Microbial diversity was determined for 12 time points at the surface of the L4 sampling site in the Western English Channel. This was performed over 11 months during 2007. A total of 182 560 sequences from the V6 hyper‐variable region of the small‐subunit ribosomal RNA gene (16S rRNA) were obtained; there were between 11 327 and 17 339 reads per sample. Approximately 7000 genera were identified, with one in every 25 reads being attributed to a new genus; yet this level of sampling far from exhausted the total diversity present at any one time point. The total data set contained 17 673 unique sequences. Only 93 (0.5%) were found at all time points, yet these few lineages comprised 50% of the total reads sequenced. The most abundant phylum was Proteobacteria (50% of all sequenced reads), while the SAR11 clade comprised 21% of the ubiquitous reads and ∼12% of the total sequenced reads. In contrast, 78% of all operational taxonomic units were only found at one time point and 67% were only found once, evidence of a large and transient rare assemblage. This time series shows evidence of seasonally structured community diversity. There is also evidence for seasonal succession, primarily reflecting changes among dominant taxa. These changes in structure were significantly correlated to a combination of temperature, phosphate and silicate concentrations.
Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or drivers of processes relevant to disease, industry and the environment. In the past two years, ...we have seen a paradigm shift in metagenomics to the application of cross-sectional and longitudinal studies enabled by advances in DNA sequencing and high-performance computing. These technologies now make it possible to broadly assess microbial diversity and function, allowing systematic investigation of the largely unexplored frontier of microbial life. To achieve this aim, the global scientific community must collaborate and agree upon common objectives and data standards to enable comparative research across the Earth's microbiome. Improvements in comparability of data will facilitate the study of biotechnologically relevant processes, such as bioprospecting for new glycoside hydrolases or identifying novel energy sources.
The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a ...combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org Contact: isatools@googlegroups.com