A new generation of non-Sanger-based sequencing technologies has delivered on its promise of sequencing DNA at unprecedented speed, thereby enabling impressive scientific achievements and novel ...biological applications. However, before stepping into the limelight, next-generation sequencing had to overcome the inertia of a field that relied on Sanger-sequencing for 30 years.
Soil ecosystems harbor the most complex prokaryotic and eukaryotic microbial communities on Earth. Experimental approaches studying these systems usually focus on either the soil community's ...taxonomic structure or its functional characteristics. Many methods target DNA as marker molecule and use PCR for amplification.
Here we apply an RNA-centered meta-transcriptomic approach to simultaneously obtain information on both structure and function of a soil community. Total community RNA is random reversely transcribed into cDNA without any PCR or cloning step. Direct pyrosequencing produces large numbers of cDNA rRNA-tags; these are taxonomically profiled in a binning approach using the MEGAN software and two specifically compiled rRNA reference databases containing small and large subunit rRNA sequences. The pyrosequencing also produces mRNA-tags; these provide a sequence-based transcriptome of the community. One soil dataset of 258,411 RNA-tags of approximately 98 bp length contained 193,219 rRNA-tags with valid taxonomic information, together with 21,133 mRNA-tags. Quantitative information about the relative abundance of organisms from all three domains of life and from different trophic levels was obtained in a single experiment. Less frequent taxa, such as soil Crenarchaeota, were well represented in the data set. These were identified by more than 2,000 rRNA-tags; furthermore, their activity in situ was revealed through the presence of mRNA-tags specific for enzymes involved in ammonia oxidation and CO(2) fixation.
This approach could be widely applied in microbial ecology by efficiently linking community structure and function in a single experiment while avoiding biases inherent in other methods.
A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and ...functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan.
MEGAN analysis of metagenomic data Huson, Daniel H; Auch, Alexander F; Qi, Ji ...
Genome Research,
03/2007, Letnik:
17, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of ...microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging sequencing-by-synthesis technologies with very high throughput are paving the way to low-cost random "shotgun" approaches. This paper introduces MEGAN, a new computer program that allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets. The approach is applied to several data sets, including the Sargasso Sea data set, a recently published metagenomic data set sampled from a mammoth bone, and several complete microbial genomes. Also, simulations that evaluate the performance of the approach for different read lengths are presented.
Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from ...Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required.
Assessment of the microbial diversity residing in arthropod vectors of medical importance is crucial for monitoring endemic infections, for surveillance of newly emerging zoonotic pathogens, and for ...unraveling the associated bacteria within its host. The tick Ixodes ricinus is recognized as the primary European vector of disease-causing bacteria in humans. Despite I. ricinus being of great public health relevance, its microbial communities remain largely unexplored to date. Here we evaluate the pathogen-load and the microbiome in single adult I. ricinus by using 454- and Illumina-based metagenomic approaches. Genomic DNA-derived sequences were taxonomically profiled using a computational approach based on the BWA algorithm, allowing for the identification of known tick-borne pathogens at the strain level and the putative tick core microbiome. Additionally, we assessed and compared the bacterial taxonomic profile in nymphal and adult I. ricinus pools collected from two distinct geographic regions in Northern Italy by means of V6-16S rRNA amplicon pyrosequencing and community based ecological analysis. A total of 108 genera belonging to representatives of all bacterial phyla were detected and a rapid qualitative assessment for pathogenic bacteria, such as Borrelia, Rickettsia and Candidatus Neoehrlichia, and for other bacteria with mutualistic relationship or undetermined function, such as Wolbachia and Rickettsiella, was possible. Interestingly, the ecological analysis revealed that the bacterial community structure differed between the examined geographic regions and tick life stages. This finding suggests that the environmental context (abiotic and biotic factors) and host-selection behaviors affect their microbiome.Our data provide the most complete picture to date of the bacterial communities present within I. ricinus under natural conditions by using high-throughput sequencing technologies. This study further demonstrates a novel detection strategy for the microbiomes of arthropod vectors in the context of epidemiological and ecological studies.
Oscillating diurnal rhythms of gene transcription, metabolic activity, and behavior are found in all three domains of life. However, diel cycles in naturally occurring heterotrophic bacteria and ...archaea have rarely been observed. Here, we report time-resolved whole-genome transcriptome profiles of multiple, naturally occurring oceanic bacterial populations sampled in situ over 3 days. As anticipated, the cyanobacterial transcriptome exhibited pronounced diel periodicity. Unexpectedly, several different heterotrophic bacterioplankton groups also displayed diel cycling in many of their gene transcripts. Furthermore, diel oscillations in different heterotrophic bacterial groups suggested population-specific timing of peak transcript expression in a variety of metabolic gene suites. These staggered multispecies waves of diel gene transcription may influence both the tempo and the mode of matter and energy transformation in the sea.
Whole genome bisulfite sequencing (WGBS), with its ability to interrogate methylation status at single CpG site resolution epigenome-wide, is a powerful technique for use in molecular experiments. ...Here, we aim to advance strategies for accurate and efficient WGBS for application in future large-scale epidemiological studies. We systematically compared the performance of three WGBS library preparation methods with low DNA input requirement (Swift Biosciences Accel-NGS, Illumina TruSeq and QIAGEN QIAseq) on two state-of-the-art sequencing platforms (Illumina NovaSeq and HiSeq X), and also assessed concordance between data generated by WGBS and methylation arrays. Swift achieved the highest proportion of CpG sites assayed and effective coverage at 26x (P < 0.001). TruSeq suffered from the highest proportion of PCR duplicates, while QIAseq failed to deliver across all quality metrics. There was little difference in performance between NovaSeq and HiSeq X, with the exception of higher read duplication rate on the NovaSeq (P < 0.05), likely attributable to the higher cluster densities on its flow cells. Systematic biases exist between WGBS and methylation arrays, with lower precision observed for WGBS across the range of depths investigated. To achieve a level of precision broadly comparable to the methylation array, a minimum coverage of 100x is recommended.
Woolly mammoths and living elephants are characterized by major phenotypic differences that have allowed them to live in very different environments. To identify the genetic changes that underlie the ...suite of woolly mammoth adaptations to extreme cold, we sequenced the nuclear genome from three Asian elephants and two woolly mammoths, and we identified and functionally annotated genetic changes unique to woolly mammoths. We found that genes with mammoth-specific amino acid changes are enriched in functions related to circadian biology, skin and hair development and physiology, lipid metabolism, adipose development and physiology, and temperature sensation. Finally, we resurrected and functionally tested the mammoth and ancestral elephant TRPV3 gene, which encodes a temperature-sensitive transient receptor potential (thermoTRP) channel involved in thermal sensation and hair growth, and we show that a single mammoth-specific amino acid substitution in an otherwise highly conserved region of the TRPV3 channel strongly affects its temperature sensitivity.
Display omitted
•Complete genomes of three Asian elephants and two woolly mammoths were sequenced•Mammoth-specific amino acid changes were found in 1,642 protein-coding genes•Genes with mammoth-specific changes are associated with adaptation to extreme cold•An amino acid change in TRPV3 may have altered temperature sensation in mammoths
Lynch et al. sequence complete genomes from three Asian elephants and two woolly mammoths and identify amino acid changes unique to woolly mammoths. Woolly-mammoth-specific amino acid changes underlie cold-adapted traits in mammoths, including small ears, thick fur, and altered temperature sensation.
Coconut, cocoa and arecanut are commercial plantation crops that play a vital role in the Indian economy while sustaining the livelihood of more than 10 million Indians. According to 2012 Food and ...Agricultural organization's report, India is the third largest producer of coconut and it dominates the production of arecanut worldwide. In this study, three Plant Growth Promoting Rhizobacteria (PGPR) from coconut (CPCRI-1), cocoa (CPCRI-2) and arecanut (CPCRI-3) characterized for the PGP activities have been sequenced. The draft genome sizes were 4.7 Mb (56% GC), 5.9 Mb (63.6% GC) and 5.1 Mb (54.8% GB) for CPCRI-1, CPCRI-2, CPCRI-3, respectively. These genomes encoded 4056 (CPCRI-1), 4637 (CPCRI-2) and 4286 (CPCRI-3) protein-coding genes. Phylogenetic analysis revealed that both CPCRI-1 and CPCRI-3 belonged to Enterobacteriaceae family, while, CPCRI-2 was a Pseudomonadaceae family member. Functional annotation of the genes predicted that all three bacteria encoded genes needed for mineral phosphate solubilization, siderophores, acetoin, butanediol, 1-aminocyclopropane-1-carboxylate (ACC) deaminase, chitinase, phenazine, 4-hydroxybenzoate, trehalose and quorum sensing molecules supportive of the plant growth promoting traits observed in the course of their isolation and characterization. Additionally, in all the three CPCRI PGPRs, we identified genes involved in synthesis of hydrogen sulfide (H2S), which recently has been proposed to aid plant growth. The PGPRs also carried genes for central carbohydrate metabolism indicating that the bacteria can efficiently utilize the root exudates and other organic materials as energy source. Genes for production of peroxidases, catalases and superoxide dismutases that confer resistance to oxidative stresses in plants were identified. Besides these, genes for heat shock tolerance, cold shock tolerance and glycine-betaine production that enable bacteria to survive abiotic stress were also identified.