Approaches in molecular biology, particularly those that deal with high-throughput sequencing of entire microbial communities (the field of metagenomics), are rapidly advancing our understanding of ...the composition and functional content of microbial communities involved in climate change, environmental pollution, human health, biotechnology, etc. Metagenomics provides researchers with the most complete picture of the taxonomic (i.e., what organisms are there) and functional (i.e., what are those organisms doing) composition of natively sampled microbial communities, making it possible to perform investigations that include organisms that were previously intractable to laboratory-controlled culturing; currently, these constitute the vast majority of all microbes on the planet. All organisms contained in environmental samples are sequenced in a culture-independent manner, most often with 16S ribosomal amplicon methods to investigate the taxonomic or whole-genome shotgun-based methods to investigate the functional content of sampled communities. Metagenomics allows researchers to characterize the community composition and functional content of microbial communities, but it cannot show which functional processes are active; however, near parallel developments in transcriptomics promise a dramatic increase in our knowledge in this area as well. Since 2008, MG-RAST (Meyer et al., BMC Bioinformatics 9:386, 2008) has served as a public resource for annotation and analysis of metagenomic sequence data, providing a repository that currently houses more than 150,000 data sets (containing 60+ tera-base-pairs) with more than 23,000 publically available. MG-RAST, or the metagenomics RAST (rapid annotation using subsystems technology) server makes it possible for users to upload raw metagenomic sequence data in (preferably) fastq or fasta format. Assessments of sequence quality, annotation with respect to multiple reference databases, are performed automatically with minimal input from the user (see Subheading 4 at the end of this chapter for more details). Post-annotation analysis and visualization are also possible, directly through the web interface, or with tools like matR (metagenomic analysis tools for R, covered later in this chapter) that utilize the MG-RAST API ( http://api.metagenomics.anl.gov/api.html ) to easily download data from any stage in the MG-RAST processing pipeline. Over the years, MG-RAST has undergone substantial revisions to keep pace with the dramatic growth in the number, size, and types of sequence data that accompany constantly evolving developments in metagenomics and related -omic sciences (e.g., metatranscriptomics).
Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit ...the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.
We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank.
The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
Abstract
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume ...scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3 is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.
We present the Biological Observation Matrix (BIOM, pronounced "biome") format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and ...observation metadata. As the number of categories of comparative omics data types (collectively, the "ome-ome") grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses.
The BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages.
The BIOM file format and the biom-format project are steps toward reducing the "bioinformatics bottleneck" that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium.
Summary
Soil microbial communities are essential for ecosystem function, but linking community composition to biogeochemical processes is challenging because of high microbial diversity and large ...spatial variability of most soil characteristics. We investigated soil bacterial community structure in a switchgrass stand planted on soil with a history of grassland vegetation at high spatial resolution to determine whether biogeographic trends occurred at the centimeter scale. Moreover, we tested whether such heterogeneity, if present, influenced community structure within or among ecosystems. Pronounced heterogeneity was observed at centimeter scales, with abrupt changes in relative abundance of phyla from sample to sample. At the ecosystem scale (> 10 m), however, bacterial community composition and structure were subtly, but significantly, altered by fertilization, with higher alpha diversity in fertilized plots. Moreover, by comparing these data with data from 1772 soils from the Earth Microbiome Project, it was found that 20% of bacterial taxa were shared between their site and diverse globally sourced soil samples, while grassland soils shared approximately 40% of their operational taxonomic units with the current study. By spanning several orders of magnitude, the analysis suggested that extreme patchiness characterized community structure at smaller scales but that coherent patterns emerged at larger length scales.
The Genomic Standards Consortium Field, Dawn; Amaral-Zettler, Linda; Cochrane, Guy ...
PLoS biology,
06/2011, Letnik:
9, Številka:
6
Journal Article
Recenzirano
Odprti dostop
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its ...usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or drivers of processes relevant to disease, industry and the environment. In the past two years, ...we have seen a paradigm shift in metagenomics to the application of cross-sectional and longitudinal studies enabled by advances in DNA sequencing and high-performance computing. These technologies now make it possible to broadly assess microbial diversity and function, allowing systematic investigation of the largely unexplored frontier of microbial life. To achieve this aim, the global scientific community must collaborate and agree upon common objectives and data standards to enable comparative research across the Earth's microbiome. Improvements in comparability of data will facilitate the study of biotechnologically relevant processes, such as bioprospecting for new glycoside hydrolases or identifying novel energy sources.
Climate change can alter the flow of nutrients and energy through terrestrial ecosystems. Using an inverse climate change field experiment in the central European Alps, we explored how long-term ...irrigation of a naturally drought-stressed pine forest altered the metabolic potential of the soil microbiome and its ability to decompose lignocellulolytic compounds as a critical ecosystem function. Drought mitigation by a decade of irrigation stimulated profound changes in the functional capacity encoded in the soil microbiome, revealing alterations in carbon and nitrogen metabolism as well as regulatory processes protecting microorganisms from starvation and desiccation. Despite the structural and functional shifts from oligotrophic to copiotrophic microbial lifestyles under irrigation and the observation that different microbial taxa were involved in the degradation of cellulose and lignin as determined by a time-series stable-isotope probing incubation experiment with
13
C-labeled substrates, degradation rates of these compounds were not affected by different water availabilities. These findings provide new insights into the impact of precipitation changes on the soil microbiome and associated ecosystem functioning in a drought-prone pine forest and will help to improve our understanding of alterations in biogeochemical cycling under a changing climate.
The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.
We describe a fully automated service for ...annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.
By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.
The most feared complication following intestinal resection is anastomotic leakage. In high risk areas (esophagus/rectum) where neoadjuvant chemoradiation is used, the incidence of anastomotic leaks ...remains unacceptably high (≈ 10%) even when performed by specialist surgeons in high volume centers. The aims of this study were to test the hypothesis that anastomotic leakage develops when pathogens colonizing anastomotic sites become in vivo transformed to express a tissue destroying phenotype. We developed a novel model of anastomotic leak in which rats were exposed to pre-operative radiation as in cancer surgery, underwent distal colon resection and then were intestinally inoculated with Pseudomonas aeruginosa, a common colonizer of the radiated intestine. Results demonstrated that intestinal tissues exposed to preoperative radiation developed a significant incidence of anastomotic leak (>60%; p<0.01) when colonized by P. aeruginosa compared to radiated tissues alone (0%). Phenotype analysis comparing the original inoculating strain (MPAO1- termed P1) and the strain retrieved from leaking anastomotic tissues (termed P2) demonstrated that P2 was altered in pyocyanin production and displayed enhanced collagenase activity, high swarming motility, and a destructive phenotype against cultured intestinal epithelial cells (i.e. apoptosis, barrier function, cytolysis). Comparative genotype analysis between P1 and P2 revealed a single nucleotide polymorphism (SNP) mutation in the mexT gene that led to a stop codon resulting in a non-functional truncated protein. Replacement of the mutated mexT gene in P2 with mexT from the original parental strain P1 led to reversion of P2 to the P1 phenotype. No spontaneous transformation was detected during 20 passages in TSB media. Use of a novel virulence suppressing compound PEG/Pi prevented P. aeruginosa transformation to the tissue destructive phenotype and prevented anastomotic leak in rats. This work demonstrates that in vivo transformation of microbial pathogens to a tissue destroying phenotype may have important implications in the pathogenesis of anastomotic leak.