Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of ...subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release.
Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Accumulating evidence shows that the gastric bacterial community may contribute to the development of gastric cancer (GC). However, the reported alterations of gastric microbiota were not always ...consistent among the literature. To assess reproducible signals in gastric microbiota during the progression of GC across studies, we performed a meta-analysis of nine publicly available 16S datasets with standard tools of the state-of-the-art. Despite study-specific batch effect, significant changes in the composition of the gastric microbiome were found during the progression of gastric carcinogenesis, especially when the Helicobacter pylori (HP) reads were removed from analyses to mitigate its compositional effect as they accounted for extremely large proportions of sequencing depths in many gastric samples. Differential microbes, including Fusobacterium, Leptotrichia, and several lactic acid bacteria such as Bifidobacterium, Lactobacillus, and Streptococcus anginosus, which were frequently and significantly enriched in GC patients compared with gastritis across studies, had good discriminatory capacity to distinguish GC samples from gastritis. Oral microbes were significantly enriched in GC compared to precancerous stages. Intriguingly, we observed mutual exclusivity of different HP species across studies. In addition, the comparison between gastric fluid and mucosal microbiome suggested their convergent dysbiosis during gastric disease progression. Taken together, our systematic analysis identified novel and consistent microbial patterns in gastric carcinogenesis.
The dysbiosis of gut microbiota is associated with the pathogenesis of human diseases. However, observing shifts in the microbe abundance cannot fully reveal underlying perturbations. Examining the ...relationship alterations (RAs) in the microbiome between health and disease statuses provides additional hints about the pathogenesis of human diseases, but no methods were designed to detect and quantify the RAs between different conditions directly. Here, we present profile monitoring for microbial relationship alteration (PM2RA), an analysis framework to identify and quantify the microbial RAs. The performance of PM2RA was evaluated with synthetic data, and it showed higher specificity and sensitivity than the co-occurrence-based methods. Analyses of real microbial datasets showed that PM2RA was robust for quantifying microbial RAs across different datasets in several diseases. By applying PM2RA, we identified several novel or previously reported microbes implicated in multiple diseases. PM2RA is now implemented as a web-based application available at http://www.pm2ra-xingyinliulab.cn/.
RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing ...probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction.
Death investigations often include an effort to establish the postmortem interval (PMI) in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the ...deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head), gene markers (16S ribosomal RNA (rRNA), 18S rRNA, internal transcribed spacer regions (ITS)), and taxonomic levels (sequence variants, species, genus, etc.). We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.
PMA (propidium monoazide) is one of the few methods that are compatible with metagenomic sequencing to characterize the live/intact microbiota. However, its efficiency in complex communities such as ...saliva and feces is still controversial. An effective method for depleting host and dead bacterial DNA in human microbiome samples is lacking. Here, we systematically evaluate the efficiency of osmotic lysis and PMAxx treatment (lyPMAxx) in characterizing the viable microbiome with four live/dead Gram+/Gram- microbial strains in simple synthetic and spiked-in complex communities. We show that lyPMAxx-quantitative PCR (qPCR)/sequencing eliminated more than 95% of the host and heat-killed microbial DNA and had a much smaller effect on the live microbes in both simple mock and spiked-in complex communities. The overall microbial load and the alpha diversity of the salivary and fecal microbiome were decreased by lyPMAxx, and the relative abundances of the microbes were changed. The relative abundances of
,
, and
in saliva were decreased by lyPMAxx, as was that of
in feces. We also found that the frequently used sample storage method, freezing with glycerol, killed or injured 65% and 94% of the living microbial cells in saliva and feces, respectively, with the
phylum affected most in saliva and the
and
phyla affected most in feces. By comparing the absolute abundance variation of the shared species among different sample types and individuals, we found that sample habitat and personal differences affected the response of microbial species to lyPMAxx and freezing.
The functions and phenotypes of microbial communities are largely defined by viable microbes. Through advanced nucleic acid sequencing technologies and downstream bioinformatic analyses, we gained an insight into the high-resolution microbial community composition of human saliva and feces, yet we know very little about whether such community DNA sequences represent viable microbes. PMA-qPCR was used to characterize the viable microbes in previous studies. However, its efficiency in complex communities such as saliva and feces is still controversial. By spiking-in four live/dead Gram+/Gram- bacterial strains, we demonstrate that lyPMAxx can effectively discriminate between live and dead microbes in the simple synthetic community and complex human microbial communities (saliva and feces). In addition, freezing storage was found to kill or injure the microbes in saliva and feces significantly, as measured with lyPMAxx-qPCR/sequencing. This method has a promising prospect in the viable/intact microbiota detection of complex human microbial communities.
Germ-free models and bacterial transplantation technology facilitate the mechanism study of the host-gut microbe interaction. Among them, zebrafish is an economical and practical model with its ...characteristics, such as transparent larva and efficient gene manipulation, differing from the mouse. Here we enumerate the similarities and differences of the genes, the digestive tract structure and the gut microbiota of zebrafish and humans. And the recent reports on colonizing human gut microbes to zebrafish are elaborated. We summarize the advantages and limitations of this model and revalidate those with some important discoveries on the utilization of zebrafish in modeling human gut microbe research. This review will make the readers clear the advances in the application of zebrafish in gut microbiota-related researches.
•Zebrafish is an economical and practical model with transparent larva and efficient gene manipulation.•The standardized methodology for rearing germ-free zebrafish has been established, and it is cost-effective.•The digestive tract structure and homologous genes of zebrafish and humans were compared.•The gut microbiota composition of zebrafish and humans is different.•Some important researches studying the interaction of host and the gut microbiota were carried out based on this model.
RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure ...for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs.
Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not ...conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary.