Identifying differentially abundant microbes is a common goal of microbiome studies. Multiple methods are used interchangeably for this purpose in the literature. Yet, there are few large-scale ...studies systematically exploring the appropriateness of using these tools interchangeably, and the scale and significance of the differences between them. Here, we compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups. We test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups. Our findings confirm that these tools identified drastically different numbers and sets of significant ASVs, and that results depend on data pre-processing. For many tools the number of features identified correlate with aspects of the data, such as sample size, sequencing depth, and effect size of community differences. ALDEx2 and ANCOM-II produce the most consistent results across studies and agree best with the intersect of results from different approaches. Nevertheless, we recommend that researchers should use a consensus approach based on multiple differential abundance methods to help ensure robust biological interpretations.
Sequence-based approaches to study microbiomes, such as 16S rRNA gene sequencing and metagenomics, are uncovering associations between microbial taxa and a myriad of factors. A drawback of these ...approaches is that the necessary sequencing library preparation and bioinformatic analyses are complicated and continuously changing, which can be a barrier for researchers new to the field. We present three essential components to conducting a microbiome experiment from start to finish: first, a simplified and step-by-step custom gene sequencing protocol that requires limited lab equipment, is cost-effective, and has been thoroughly tested and utilized on various sample types; second, a series of scripts to integrate various commonly used bioinformatic tools that is available as a standalone installation or as a single downloadable virtual image; and third, a set of bioinformatic workflows and tutorials to provide step-by-step guidance and education for those new to the microbiome field. This resource will provide the foundations for those newly entering the microbiome field and will provide much-needed guidance and best practices to ensure that quality microbiome research is undertaken. All protocols, scripts, workflows, tutorials, and virtual images are freely available through the Microbiome Helper website (https://github.com/mlangill/microbiome_helper/wiki).
As the microbiome field continues to grow, a multitude of researchers are learning how to conduct proper microbiome experiments. We outline here a streamlined and custom approach to processing samples from detailed sequencing library construction to step-by-step bioinformatic standard operating procedures. This allows for rapid and reliable microbiome analysis, allowing researchers to focus more on their experiment design and results. Our sequencing protocols, bioinformatic tutorials, and bundled software are freely available through Microbiome Helper. As the microbiome research field continues to evolve, Microbiome Helper will be updated with new protocols, scripts, and training materials.
High-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic ...units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomic units. However, there have been numerous bioinformatic packages recently released that attempt to correct sequencing errors to determine real biological sequences at single nucleotide resolution by generating amplicon sequence variants (ASVs). As more researchers begin to use high resolution ASVs, there is a need for an in-depth and unbiased comparison of these novel "denoising" pipelines. In this study, we conduct a thorough comparison of three of the most widely-used denoising packages (DADA2, UNOISE3, and Deblur) as well as an open-reference 97% OTU clustering pipeline on mock, soil, and host-associated communities. We found from the mock community analyses that although they produced similar microbial compositions based on relative abundance, the approaches identified vastly different numbers of ASVs that significantly impact alpha diversity metrics. Our analysis on real datasets using recommended settings for each denoising pipeline also showed that the three packages were consistent in their per-sample compositions, resulting in only minor differences based on weighted UniFrac and Bray-Curtis dissimilarity. DADA2 tended to find more ASVs than the other two denoising pipelines when analyzing both the real soil data and two other host-associated datasets, suggesting that it could be better at finding rare organisms, but at the expense of possible false positives. The open-reference OTU clustering approach identified considerably more OTUs in comparison to the number of ASVs from the denoising pipelines in all datasets tested. The three denoising approaches were significantly different in their run times, with UNOISE3 running greater than 1,200 and 15 times faster than DADA2 and Deblur, respectively. Our findings indicate that, although all pipelines result in similar general community structure, the number of ASVs/OTUs and resulting alpha-diversity metrics varies considerably and should be considered when attempting to identify rare organisms from possible background noise.
Bacteria were first detected in human tumors more than 100 years ago, but the characterization of the tumor microbiome has remained challenging because of its low biomass. We undertook a ...comprehensive analysis of the tumor microbiome, studying 1526 tumors and their adjacent normal tissues across seven cancer types, including breast, lung, ovary, pancreas, melanoma, bone, and brain tumors. We found that each tumor type has a distinct microbiome composition and that breast cancer has a particularly rich and diverse microbiome. The intratumor bacteria are mostly intracellular and are present in both cancer and immune cells. We also noted correlations between intratumor bacteria or their predicted functions with tumor types and subtypes, patients' smoking status, and the response to immunotherapy.
High-throughput shotgun metagenomics sequencing has enabled the profiling of myriad natural communities. These data are commonly used to identify gene families and pathways that were potentially ...gained or lost in an environment and which may be involved in microbial adaptation. Despite the widespread interest in these events, there are no established best practices for identifying gene gain and loss in metagenomics data. Horizontal gene transfer (HGT) represents several mechanisms of gene gain that are especially of interest in clinical microbiology due to the rapid spread of antibiotic resistance genes in natural communities. Several additional mechanisms of gene gain and loss, including gene duplication, gene loss-of-function events, and de novo gene birth are also important to consider in the context of metagenomes but have been less studied. This review is largely focused on detecting HGT in prokaryotic metagenomes, but methods for detecting these other mechanisms are first discussed. For this article to be self-contained, we provide a general background on HGT and the different possible signatures of this process. Lastly, we discuss how improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events. Several recent technological advances could help improve metagenome assemblies: long-read sequencing, determining the physical proximity of contigs, optical mapping of short sequences along chromosomes, and single-cell metagenomics. The benefits and limitations of these advances are discussed and open questions in this area are highlighted.
Marker-gene sequencing is a cost-effective method of taxonomically profiling microbial communities. Unlike metagenomic approaches, marker-gene sequencing does not provide direct information about the ...functional genes that are present in the genomes of community members. However, by capitalizing on the rapid growth in the number of sequenced genomes, it is possible to infer which functions are likely associated with a marker gene based on its sequence similarity with a reference genome. The PICRUSt tool is based on this idea and can predict functional category abundances based on an input marker gene. In brief, this method requires a reference phylogeny with tips corresponding to taxa with reference genomes as well as taxa lacking sequenced genomes. A modified ancestral state reconstruction (ASR) method is then used to infer counts of functional categories for taxa without reference genomes. The predictions are written to pre-calculated files, which can be cross-referenced with other datasets to quickly generate predictions of functional potential for a community. This chapter will give an in-depth description of these methods and describe how PICRUSt should be used.
The importance of gut microbiota in human health and pathophysiology is undisputable. Despite the abundance of metagenomics data, the functional dynamics of gut microbiota in human health and disease ...remain elusive. Urolithin A (UroA), a major microbial metabolite derived from polyphenolics of berries and pomegranate fruits displays anti-inflammatory, anti-oxidative, and anti-ageing activities. Here, we show that UroA and its potent synthetic analogue (UAS03) significantly enhance gut barrier function and inhibit unwarranted inflammation. We demonstrate that UroA and UAS03 exert their barrier functions through activation of aryl hydrocarbon receptor (AhR)- nuclear factor erythroid 2-related factor 2 (Nrf2)-dependent pathways to upregulate epithelial tight junction proteins. Importantly, treatment with these compounds attenuated colitis in pre-clinical models by remedying barrier dysfunction in addition to anti-inflammatory activities. Cumulatively, the results highlight how microbial metabolites provide two-pronged beneficial activities at gut epithelium by enhancing barrier functions and reducing inflammation to protect from colonic diseases.
Significance Plants have undergone repeated rounds of whole-genome duplication, followed by gene degeneration and loss. Using whole-genome resequencing, we examined the origins of the recent ...tetraploid Capsella bursa-pastoris and the earliest stages of genome evolution after polyploidization. We conclude the species had a hybrid origin from two distinct Capsella lineages within the past 100,000–300,000 y. Our analyses suggest the absence of rapid gene loss but provide evidence that the species has large numbers of inactivating mutations, many of which were inherited from the parental species. Our results suggest that genome evolution following polyploidy is determined not only by genome redundancy but also by demography, the mating system, and the evolutionary history of the parental species.
Whole-genome duplication (WGD) events have occurred repeatedly during flowering plant evolution, and there is growing evidence for predictable patterns of gene retention and loss following polyploidization. Despite these important insights, the rate and processes governing the earliest stages of diploidization remain poorly understood, and the relative importance of genetic drift, positive selection, and relaxed purifying selection in the process of gene degeneration and loss is unclear. Here, we conduct whole-genome resequencing in Capsella bursa-pastoris , a recently formed tetraploid with one of the most widespread species distributions of any angiosperm. Whole-genome data provide strong support for recent hybrid origins of the tetraploid species within the past 100,000–300,000 y from two diploid progenitors in the Capsella genus. Major-effect inactivating mutations are frequent, but many were inherited from the parental species and show no evidence of being fixed by positive selection. Despite a lack of large-scale gene loss, we observe a decrease in the efficacy of natural selection genome-wide due to the combined effects of demography, selfing, and genome redundancy from WGD. Our results suggest that the earliest stages of diploidization are associated with quantitative genome-wide decreases in the strength and efficacy of selection rather than rapid gene loss, and that nonfunctionalization can receive a “head start” through a legacy of deleterious variants and differential expression originating in parental diploid populations.