High-throughput sequencing of 16S ribosomal RNA gene amplicons has facilitated understanding of complex microbial communities, but the inherent noise in PCR and DNA sequencing limits differentiation ...of closely related bacteria. Although many scientific questions can be addressed with broad taxonomic profiles, clinical, food safety, and some ecological applications require higher specificity. Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity. Using simulations, mock mixtures, and real data sets, we detected closely related bacterial sequences with single nucleotide differences while removing false positives and maintaining stability in detection, suggesting that Deblur is limited only by read length and diversity within the amplicon sequences. Because Deblur operates on a per-sample level, it scales to modern data sets and meta-analyses. To highlight Deblur's ability to integrate data sets, we include an interactive exploration of its application to multiple distinct sequencing rounds of the American Gut Project. Deblur is open source under the Berkeley Software Distribution (BSD) license, easily installable, and downloadable from https://github.com/biocore/deblur.
Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time.
In order for human microbiome studies to translate into actionable outcomes for health, meta-analysis of reproducible data from population-scale cohorts is needed. Achieving sufficient ...reproducibility in microbiome research has proven challenging. We report a baseline investigation of variability in taxonomic profiling for the Microbiome Quality Control (MBQC) project baseline study (MBQC-base). Blinded specimen sets from human stool, chemostats, and artificial microbial communities were sequenced by 15 laboratories and analyzed using nine bioinformatics protocols. Variability depended most on biospecimen type and origin, followed by DNA extraction, sample handling environment, and bioinformatics. Analysis of artificial community specimens revealed differences in extraction efficiency and bioinformatic classification. These results may guide researchers in experimental design choices for gut microbiome studies.
A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead−based platforms (KingFisher, epMotion, and ...Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for
gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.
Immediate freezing at -20°C or below has been considered the gold standard for microbiome preservation, yet this approach is not feasible for many field studies, ranging from anthropology to wildlife ...conservation. Here we tested five methods for preserving human and dog fecal specimens for periods of up to 8 weeks, including such types of variation as freeze-thaw cycles and the high temperature fluctuations often encountered under field conditions. We found that three of the methods-95% ethanol, FTA cards, and the OMNIgene Gut kit-can preserve samples sufficiently well at ambient temperatures such that differences at 8 weeks are comparable to differences among technical replicates. However, even the worst methods, including those with no fixative, were able to reveal microbiome differences between species at 8 weeks and between individuals after a week, allowing meta-analyses of samples collected using various methods when the effect of interest is expected to be larger than interindividual variation (although use of a single method within a study is strongly recommended to reduce batch effects). Encouragingly for FTA cards, the differences caused by this method are systematic and can be detrended. As in other studies, we strongly caution against the use of 70% ethanol. The results, spanning 15 individuals and over 1,200 samples, provide our most comprehensive view to date of storage effects on stool and provide a paradigm for the future studies of other sample types that will be required to provide a global view of microbial diversity and its interaction among humans, animals, and the environment.
Our study, spanning 15 individuals and over 1,200 samples, provides our most comprehensive view to date of storage and stabilization effects on stool. We tested five methods for preserving human and dog fecal specimens for periods of up to 8 weeks, including the types of variation often encountered under field conditions, such as freeze-thaw cycles and high temperature fluctuations. We show that several cost-effective methods provide excellent microbiome stability out to 8 weeks, opening up a range of field studies with humans and wildlife that would otherwise be cost-prohibitive.
Exposure of newborns to the maternal vaginal microbiota is interrupted with cesarean birthing. Babies delivered by cesarean section (C-section) acquire a microbiota that differs from that of ...vaginally delivered infants, and C-section delivery has been associated with increased risk for immune and metabolic disorders. Here we conducted a pilot study in which infants delivered by C-section were exposed to maternal vaginal fluids at birth. Similarly to vaginally delivered babies, the gut, oral and skin bacterial communities of these newborns during the first 30 d of life was enriched in vaginal bacteria--which were underrepresented in unexposed C-section-delivered infants--and the microbiome similarity to those of vaginally delivered infants was greater in oral and skin samples than in anal samples. Although the long-term health consequences of restoring the microbiota of C-section-delivered infants remain unclear, our results demonstrate that vaginal microbes can be partially restored at birth in C-section-delivered babies.
Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and ...the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses.
Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate.
These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.
Microbiota-based prediction of chronic infections is promising yet not well established. Early childhood caries (ECC) is the most common infection in children. Here we simultaneously tracked ...microbiota development at plaque and saliva in 50 4-year-old preschoolers for 2 years; children either stayed healthy, transitioned into cariogenesis, or experienced caries exacerbation. Caries onset delayed microbiota development, which is otherwise correlated with aging in healthy children. Both plaque and saliva microbiota are more correlated with changes in ECC severity (dmfs) during onset than progression. By distinguishing between aging- and disease-associated taxa and exploiting the distinct microbiota dynamics between onset and progression, we developed a model, Microbial Indicators of Caries, to diagnose ECC from healthy samples with 70% accuracy and predict, with 81% accuracy, future ECC onsets for samples clinically perceived as healthy. Thus, caries onset in apparently healthy teeth can be predicted using microbiota, when appropriately de-trended for age.
Display omitted
•Oral microbiota in 50 four-year-old children were tracked for 2 years•Age-dependent microbiota development is perturbed by early childhood caries (ECC) onset•Shifts in microbiota precede manifestation of clinical symptoms of ECC•Microbial Indicators of Caries, when de-trended for age, can predict ECC onset
Teng et al. tracked plaque and saliva microbiota of 50 4-year-old children for 2 years. By distinguishing between aging- and disease-associated taxa and exploiting the distinct microbiota dynamics between disease onset and progression, a predictive model, Microbial Indicators of Caries, is proposed as a method to predict future caries onset.
The coronavirus disease 2019 (COVID-19) has rapidly spread around the world, impacting the lives of many individuals. Growing evidence suggests that the nasopharyngeal and respiratory tract ...microbiome are influenced by various health and disease conditions, including the presence and the severity of different viral disease. To evaluate the potential interactions between Severe Acute Respiratory Syndrome Corona 2 (SARS-CoV-2) and the nasopharyngeal microbiome. Microbial composition of nasopharyngeal swab samples submitted to the clinical microbiology lab for suspected SARS-CoV-2 infections was assessed using 16S amplicon sequencing. The study included a total of 55 nasopharyngeal samples from 33 subjects, with longitudinal sampling available for 12 out of the 33 subjects. 21 of the 33 subjects had at least one positive COVID-19 PCR results as determined by the clinical microbiology lab. Inter-personal variation was the strongest factor explaining > 75% of the microbial variation, irrespective of the SARS-CoV-2 status. No significant effect of SARS-CoV-2 on the nasopharyngeal microbial community was observed using multiple analysis methods. These results indicate that unlike some other viruses, for which an effect on the microbial composition was noted, SARS-CoV-2 does not have a strong effect on the nasopharynx microbial habitants.
Microbial sequences inferred as belonging to one sample may not have originated from that sample. Such contamination may arise from laboratory or reagent sources or from physical exchange between ...samples. This study seeks to rigorously assess the behavior of this often-neglected between-sample contamination. Using unique bacteria, each assigned a particular well in a plate, we assess the frequency at which sequences from each source appear in other wells. We evaluate the effects of different DNA extraction methods performed in two laboratories using a consistent plate layout, including blanks and low-biomass and high-biomass samples. Well-to-well contamination occurred primarily during DNA extraction and, to a lesser extent, in library preparation, while barcode leakage was negligible. Laboratories differed in the levels of contamination. Extraction methods differed in their occurrences and levels of well-to-well contamination, with plate methods having more well-to-well contamination and single-tube methods having higher levels of background contaminants. Well-to-well contamination occurred primarily in neighboring samples, with rare events up to 10 wells apart. This effect was greatest in samples with lower biomass and negatively impacted metrics of alpha and beta diversity. Our work emphasizes that sample contamination is a combination of cross talk from nearby wells and background contaminants. To reduce well-to-well effects, samples should be randomized across plates, samples of similar biomasses should be processed together, and manual single-tube extractions or hybrid plate-based cleanups should be employed. Researchers should avoid simplistic removals of taxa or operational taxonomic units (OTUs) appearing in negative controls, as many will be microbes from other samples rather than reagent contaminants.
Microbiome research has uncovered magnificent biological and chemical stories across nearly all areas of life science, at times creating controversy when findings reveal fantastic descriptions of microbes living and even thriving in what were once thought to be sterile environments. Scientists have refuted many of these claims because of contamination, which has led to robust requirements, including the use of controls, for validating accurate portrayals of microbial communities. In this study, we describe a previously undocumented form of contamination, well-to-well contamination, and show that this sort of contamination primarily occurs during DNA extraction rather than PCR, is highest with plate-based methods compared to single-tube extraction, and occurs at a higher frequency in low-biomass samples. This finding has profound importance in the field, as many current techniques to "decontaminate" a data set simply rely on an assumption that microbial reads found in blanks are contaminants from "outside," namely, the reagents or consumables.
The use of sterile swabs is a convenient and common way to collect microbiome samples, and many studies have shown that the effects of room-temperature storage are smaller than physiologically ...relevant differences between subjects. However, several bacterial taxa, notably members of the class
, grow at room temperature, sometimes confusing microbiome results, particularly when stability is assumed. Although comparative benchmarking has shown that several preservation methods, including the use of 95% ethanol, fecal occult blood test (FOBT) and FTA cards, and Omnigene-GUT kits, reduce changes in taxon abundance during room-temperature storage, these techniques all have drawbacks and cannot be applied retrospectively to samples that have already been collected. Here we performed a meta-analysis using several different microbiome sample storage condition studies, showing consistent trends in which specific bacteria grew (i.e., "bloomed") at room temperature, and introduce a procedure for removing the sequences that most distort analyses. In contrast to similarity-based clustering using operational taxonomic units (OTUs), we use a new technique called "Deblur" to identify the exact sequences corresponding to blooming taxa, greatly reducing false positives and also dramatically decreasing runtime. We show that applying this technique to samples collected for the American Gut Project (AGP), for which participants simply mail samples back without the use of ice packs or other preservatives, yields results consistent with published microbiome studies performed with frozen or otherwise preserved samples.
In many microbiome studies, the necessity to store samples at room temperature (i.e., remote fieldwork) and the ability to ship samples without hazardous materials that require special handling training, such as ethanol (i.e., citizen science efforts), is paramount. However, although room-temperature storage for a few days has been shown not to obscure physiologically relevant microbiome differences between comparison groups, there are still changes in specific bacterial taxa, notably, in members of the class
, that can make microbiome profiles difficult to interpret. Here we identify the most problematic taxa and show that removing sequences from just a few fast-growing taxa is sufficient to correct microbiome profiles.