Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive ...technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the risks associated with comparing data generated using different strategies. We also recommend that laboratories with particular interests in certain microbes should optimise their protocols to accurately detect these taxa using different techniques.
Variations in the composition of the human intestinal microbiota are linked to diverse health conditions. High-throughput molecular technologies have recently elucidated microbial community structure ...at much higher resolution than was previously possible. Here we compare two such methods, pyrosequencing and a phylogenetic array, and evaluate classifications based on two variable 16S rRNA gene regions.
Over 1.75 million amplicon sequences were generated from the V4 and V6 regions of 16S rRNA genes in bacterial DNA extracted from four fecal samples of elderly individuals. The phylotype richness, for individual samples, was 1,400-1,800 for V4 reads and 12,500 for V6 reads, and 5,200 unique phylotypes when combining V4 reads from all samples. The RDP-classifier was more efficient for the V4 than for the far less conserved and shorter V6 region, but differences in community structure also affected efficiency. Even when analyzing only 20% of the reads, the majority of the microbial diversity was captured in two samples tested. DNA from the four samples was hybridized against the Human Intestinal Tract (HIT) Chip, a phylogenetic microarray for community profiling. Comparison of clustering of genus counts from pyrosequencing and HITChip data revealed highly similar profiles. Furthermore, correlations of sequence abundance and hybridization signal intensities were very high for lower-order ranks, but lower at family-level, which was probably due to ambiguous taxonomic groupings.
The RDP-classifier consistently assigned most V4 sequences from human intestinal samples down to genus-level with good accuracy and speed. This is the deepest sequencing of single gastrointestinal samples reported to date, but microbial richness levels have still not leveled out. A majority of these diversities can also be captured with five times lower sampling-depth. HITChip hybridizations and resulting community profiles correlate well with pyrosequencing-based compositions, especially for lower-order ranks, indicating high robustness of both approaches. However, incompatible grouping schemes make exact comparison difficult.
The human gastrointestinal tract (GIT) represents one of the most densely populated microbial ecosystems studied to date. Although this microbial consortium has been recognized to have a crucial ...impact on human health, its precise composition is still subject to intense investigation. Among the GIT microbiota, bifidobacteria represent an important commensal group, being among the first microbial colonizers of the gut. However, the prevalence and diversity of members of the genus Bifidobacterium in the infant intestinal microbiota has not yet been fully characterized, while some inconsistencies exist in literature regarding the abundance of this genus.
In the current report, we assessed the complexity of the infant intestinal bifidobacterial population by analysis of pyrosequencing data of PCR amplicons derived from two hypervariable regions of the 16 S rRNA gene. Eleven faecal samples were collected from healthy infants of different geographical origins (Italy, Spain or Ireland), feeding type (breast milk or formula) and mode of delivery (vaginal or caesarean delivery), while in four cases, faecal samples of corresponding mothers were also analyzed.
In contrast to several previously published culture-independent studies, our analysis revealed a predominance of bifidobacteria in the infant gut as well as a profile of co-occurrence of bifidobacterial species in the infant's intestine.
The gut microbiome has been implicated in cancer in several ways, as specific microbial signatures are known to promote cancer development and influence safety, tolerability and efficacy of ...therapies. The 'omics' technologies used for microbiome analysis continuously evolve and, although much of the research is still at an early stage, large-scale datasets of ever increasing size and complexity are being produced. However, there are varying levels of difficulty in realizing the full potential of these new tools, which limit our ability to critically analyse much of the available data. In this Perspective, we provide a brief overview on the role of gut microbiome in cancer and focus on the need, role and limitations of a machine learning-driven approach to analyse large amounts of complex health-care information in the era of big data. We also discuss the potential application of microbiome-based big data aimed at promoting precision medicine in cancer.
Next-generation sequencing platforms have revolutionised our ability to investigate the microbiota composition of complex environments, frequently through 16S rRNA gene sequencing of the bacterial ...component of the community. Numerous factors, including DNA extraction method, primer sequences and sequencing platform employed, can affect the accuracy of the results achieved. The aim of this study was to determine the impact of these three factors on 16S rRNA gene sequencing results, using mock communities and mock community DNA.
The use of different primer sequences (V4-V5, V1-V2 and V1-V2 degenerate primers) resulted in differences in the genera and species detected. The V4-V5 primers gave the most comparable results across platforms. The three Ion PGM primer sets detected more of the 20 mock community species than the equivalent MiSeq primer sets. Data generated from DNA extracted using the 2 extraction methods were very similar.
Microbiota compositional data differed depending on the primers and sequencing platform that were used. The results demonstrate the risks in comparing data generated using different sequencing approaches and highlight the merits of choosing a standardised approach for sequencing in situations where a comparison across multiple sequencing runs is required.
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore ...host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
High-throughput molecular technologies can profile microbial communities at high resolution even in complex environments like the intestinal microbiota. Recent improvements in next-generation ...sequencing technologies allow for even finer resolution. We compared phylogenetic profiling of both longer (454 Titanium) sequence reads with shorter, but more numerous, paired-end reads (Illumina). For both approaches, we targeted six tandem combinations of 16S rRNA gene variable regions, in microbial DNA extracted from a human faecal sample, in order to investigate their limitations and potentials. In silico evaluations predicted that the V3/V4 and V4/V5 regions would provide the highest classification accuracies for both technologies. However, experimental sequencing of the V3/V4 region revealed significant amplification bias compared to the other regions, emphasising the necessity for experimental validation of primer pairs. The latest developments of 454 and Illumina technologies offered higher resolution compared to their previous versions, and showed relative consistency with each other. However, the majority of the Illumina reads could not be classified down to genus level due to their shorter length and higher error rates beyond 60 nt. Nonetheless, with improved quality and longer reads, the far greater coverage of Illumina promises unparalleled insights into highly diverse and complex environments such as the human gut.
A signature that unifies the colorectal cancer (CRC) microbiota across multiple studies has not been identified. In addition to methodological variance, heterogeneity may be caused by both microbial ...and host response differences, which was addressed in this study.
We prospectively studied the colonic microbiota and the expression of specific host response genes using faecal and mucosal samples ('ON' and 'OFF' the tumour, proximal and distal) from 59 patients undergoing surgery for CRC, 21 individuals with polyps and 56 healthy controls. Microbiota composition was determined by 16S rRNA amplicon sequencing; expression of host genes involved in CRC progression and immune response was quantified by real-time quantitative PCR.
The microbiota of patients with CRC differed from that of controls, but alterations were not restricted to the cancerous tissue. Differences between distal and proximal cancers were detected and faecal microbiota only partially reflected mucosal microbiota in CRC. Patients with CRC can be stratified based on higher level structures of mucosal-associated bacterial co-abundance groups (CAGs) that resemble the previously formulated concept of enterotypes. Of these, Bacteroidetes Cluster 1 and Firmicutes Cluster 1 were in decreased abundance in CRC mucosa, whereas Bacteroidetes Cluster 2, Firmicutes Cluster 2, Pathogen Cluster and Prevotella Cluster showed increased abundance in CRC mucosa. CRC-associated CAGs were differentially correlated with the expression of host immunoinflammatory response genes.
CRC-associated microbiota profiles differ from those in healthy subjects and are linked with distinct mucosal gene-expression profiles. Compositional alterations in the microbiota are not restricted to cancerous tissue and differ between distal and proximal cancers.
Grouping the microbiota of individual subjects into compositional categories, or enterotypes, based on the dominance of certain genera may have oversimplified a complex situation.
Lactobacilli are gram-positive bacteria that are a subdominant element in the human gastrointestinal microbiota, and which are commonly used in the food industry. Some lactobacilli are considered ...probiotic, and have been associated with health benefits. However, there is very little culture-independent information on how consumed probiotic microorganisms might affect the entire intestinal microbiota. We therefore studied the impact of the administration of Lactobacillus salivarius UCC118, a microorganism well characterized for its probiotic properties, on the composition of the intestinal microbiota in two model animals. UCC118 has anti-infective activity due to production of the bacteriocin Abp118, a broad-spectrum class IIb bacteriocin, which we hypothesized could impact the microbiota. Mice and pigs were administered wild-type (WT) L. salivarius UCC118 cells, or a mutant lacking bacteriocin production. The microbiota composition was determined by pyrosequencing of 16S rRNA gene amplicons from faeces. The data show that L. salivarius UCC118 administration had no significant effect on proportions of major phyla comprising the mouse microbiota, whether the strain was producing bacteriocin or not. However, L. salivarius UCC118 WT administration led to a significant decrease in Spirochaetes levels, the third major phylum in the untreated pig microbiota. In both pigs and mice, L. salivarius UCC118 administration had an effect on Firmicutes genus members. This effect was not observed when the mutant strain was administered, and was thus associated with bacteriocin production. Surprisingly, in both models, L. salivarius UCC118 administration and production of Abp118 had an effect on gram-negative microorganisms, even though Abp118 is normally not active in vitro against this group of microorganisms. Thus L. salivarius UCC118 administration has a significant but subtle impact on mouse and pig microbiota, by a mechanism that seems at least partially bacteriocin-dependent.