Despite substantial interest in the species diversity of the human microbiome and its role in disease, the scale of its genetic diversity, which is fundamental to deciphering human-microbe ...interactions, has not been quantified. Here, we conducted a cross-study meta-analysis of metagenomes from two human body niches, the mouth and gut, covering 3,655 samples from 13 studies. We found staggering genetic heterogeneity in the dataset, identifying a total of 45,666,334 non-redundant genes (23,961,508 oral and 22,254,436 gut) at the 95% identity level. Fifty percent of all genes were “singletons,” or unique to a single metagenomic sample. Singletons were enriched for different functions (compared with non-singletons) and arose from sub-population-specific microbial strains. Overall, these results provide potential bases for the unexplained heterogeneity observed in microbiome-derived human phenotypes. One the basis of these data, we built a resource, which can be accessed at https://microbial-genes.bio.
Display omitted
•Cross-study meta-analysis of metagenomes covering 3,655 samples from two body sites•Meta-analysis uncovers staggering microbial gene diversity•50% of all genes in a metagenomic sample are individual-specific or “singletons”•Individual’s microbiomes can be fingerprinted via rare microbial strains
Tierney et al. presents a meta-analysis of metagenomes covering 3,655 samples from two body sites. They identify 45,666,334 non-redundant genes in the human oral and gut microbiome, and half of every person’s microbial gene content is completely unique. These rare genes, denotes singletons, predominantly arise from extremely rare microbial strains.
We propose microbiome disease "architectures": linking >1 million microbial features (species, pathways, and genes) to 7 host phenotypes from 13 cohorts using a pipeline designed to identify ...associations that are robust to analytical model choice. Here, we quantify conservation and heterogeneity in microbiome-disease associations, using gene-level analysis to identify strain-specific, cross-disease, positive and negative associations. We find coronary artery disease, inflammatory bowel diseases, and liver cirrhosis to share gene-level signatures ascribed to the Streptococcus genus. Type 2 diabetes, by comparison, has a distinct metagenomic signature not linked to any one specific species or genus. We additionally find that at the species-level, the prior-reported connection between Solobacterium moorei and colorectal cancer is not consistently identified across models-however, our gene-level analysis unveils a group of robust, strain-specific gene associations. Finally, we validate our findings regarding colorectal cancer and inflammatory bowel diseases in independent cohorts and identify that features inversely associated with disease tend to be less reproducible than features enriched in disease. Overall, our work is not only a step towards gene-based, cross-disease microbiome diagnostic indicators, but it also illuminates the nuances of the genetic architecture of the human microbiome, including tension between gene- and species-level associations.
The human gut microbiome is linked to many states of human health and disease
. The metabolic repertoire of the gut microbiome is vast, but the health implications of these bacterial pathways are ...poorly understood. In this study, we identify a link between members of the genus Veillonella and exercise performance. We observed an increase in Veillonella relative abundance in marathon runners postmarathon and isolated a strain of Veillonella atypica from stool samples. Inoculation of this strain into mice significantly increased exhaustive treadmill run time. Veillonella utilize lactate as their sole carbon source, which prompted us to perform a shotgun metagenomic analysis in a cohort of elite athletes, finding that every gene in a major pathway metabolizing lactate to propionate is at higher relative abundance postexercise. Using
C
-labeled lactate in mice, we demonstrate that serum lactate crosses the epithelial barrier into the lumen of the gut. We also show that intrarectal instillation of propionate is sufficient to reproduce the increased treadmill run time performance observed with V. atypica gavage. Taken together, these studies reveal that V. atypica improves run time via its metabolic conversion of exercise-induced lactate into propionate, thereby identifying a natural, microbiome-encoded enzymatic process that enhances athletic performance.
We analysed a large health insurance dataset to assess the genetic and environmental contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling pairs out of 44,859,462 ...individuals that live in the United States. We estimated the contribution of environmental risk factors (socioeconomic status (SES), air pollution and climate) in each phenotype. Mean heritability (h
= 0.311) and shared environmental variance (c
= 0.088) were higher than variance attributed to specific environmental factors such as zip-code-level SES (var
= 0.002), daily air quality (var
= 0.0004), and average temperature (var
= 0.001) overall, as well as for individual phenotypes. We found significant heritability and shared environment for a number of comorbidities (h
= 0.433, c
= 0.241) and average monthly cost (h
= 0.290, c
= 0.302). All results are available using our Claims Analysis of Twin Correlation and Heritability (CaTCH) web application.
Hypothesis generation in observational, biomedical data science often starts with computing an association or identifying the statistical relationship between a dependent and an independent variable. ...However, the outcome of this process depends fundamentally on modeling strategy, with differing strategies generating what can be called “vibration of effects” (VoE). VoE is defined by variation in associations that often lead to contradictory results. Here, we present a computational tool capable of modeling VoE in biomedical data by fitting millions of different models and comparing their output. We execute a VoE analysis on a series of widely reported associations (e.g., carrot intake associated with eyesight) with an extended additional focus on lifestyle exposures (e.g., physical activity) and components of the Framingham Risk Score for cardiovascular health (e.g., blood pressure). We leveraged our tool for potential confounder identification, investigating what adjusting variables are responsible for conflicting models. We propose modeling VoE as a critical step in navigating discovery in observational data, discerning robust associations, and cataloging adjusting variables that impact model output.
The microbiome is a new frontier for building predictors of human phenotypes. However, machine learning in the microbiome is fraught with issues of reproducibility, driven in large part by the wide ...range of analytic models and metagenomic data types available. We aimed to build robust metagenomic predictors of host phenotype by comparing prediction performances and biological interpretation across 8 machine learning methods and 4 different types of metagenomic data. Using 1,570 samples from 300 infants, we fit 7,865 models for 6 host phenotypes. We demonstrate the dependence of accuracy on algorithm choice and feature definition in microbiome data and propose a framework for building microbiome-derived indicators of host phenotype. We additionally identify biological features predictive of age, sex, breastfeeding status, historical antibiotic usage, country of origin, and delivery type. Our complete results can be viewed at http://apps.chiragjpgroup.org/ubiome_predictions/.
Evaluating the relationship between the human gut microbiome and disease requires computing reliable statistical associations. Here, using millions of different association modeling strategies, we ...evaluated the consistency-or robustness-of microbiome-based disease indicators for 6 prevalent and well-studied phenotypes (across 15 public cohorts and 2,343 individuals). We were able to discriminate between analytically robust versus nonrobust results. In many cases, different models yielded contradictory associations for the same taxon-disease pairing, some showing positive correlations and others negative. When querying a subset of 581 microbe-disease associations that have been previously reported in the literature, 1 out of 3 taxa demonstrated substantial inconsistency in association sign. Notably, >90% of published findings for type 1 diabetes (T1D) and type 2 diabetes (T2D) were particularly nonrobust in this regard. We additionally quantified how potential confounders-sequencing depth, glucose levels, cholesterol, and body mass index, for example-influenced associations, analyzing how these variables affect the ostensible correlation between Faecalibacterium prausnitzii abundance and a healthy gut. Overall, we propose our approach as a method to maximize confidence when prioritizing findings that emerge from microbiome association studies.
Bacteriophages are recognized as the most abundant members of microbiomes and have therefore a profound impact on microbial communities through the interactions with their bacterial hosts. The ...International Metagenomics and Metadesign of Subways and Urban Biomes Consortium (MetaSUB) has sampled mass-transit systems in 60 cities over 3 years using metagenomics, throwing light into these hitherto largely unexplored urban environments. MetaSUB focused primarily on the bacterial community. In this work, we explored MetaSUB metagenomic data in order to recover and analyze bacteriophage genomes. We recovered and analyzed 1714 phage genomes with size at least 40 kbp, from the class Caudoviricetes, the vast majority of which (80%) are novel. The recovered genomes were predicted to belong to temperate (69%) and lytic (31%) phages. Thirty-three of these genomes have more than 200 kbp, and one of them reaches 572 kbp, placing it among the largest phage genomes ever found. In general, the phages tended to be site-specific or nearly so, but 194 genomes could be identified in every city from which phage genomes were retrieved. We predicted hosts for 48% of the phages and observed general agreement between phage abundance and the respective bacterial host abundance, which include the most common nosocomial multidrug-resistant pathogens. A small fraction of the phage genomes are carriers of antibiotic resistance genes, and such genomes tended to be particularly abundant in the sites where they were found. We also detected CRISPR-Cas systems in five phage genomes. This study expands the previously reported MetaSUB results and is a contribution to the knowledge about phage diversity, global distribution, and phage genome content.