Abstract
Motivation
Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches ...say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations.
Results
Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine.
Availability and implementation
Software and vignettes are available at http://github.com/kevinVervier/TiSAn.
Supplementary information
Supplementary data are available at Bioinformatics online.
Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the ...binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions.
We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 10(8) samples in 10(7) dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2-17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise.
Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics
pierre.mahe@biomerieux.com
Supplementary data are available at Bioinformatics online.
Recent studies have established that single nucleotide polymorphisms are sufficient to build accurate predictive models of gene expression. Gamazon, et al., found that gene expression values ...predicted from cis neighborhood SNPs show statistical association with disease status. In this work, we remove the cis neighborhood constraint during the learning process, and propose a novel predictive approach called SLINGER. We demonstrate that models drawing from a genome-wide set of SNPs are able to predict expression for more genes than the ones built on cis neighborhood only. Results indicate that these new models significantly improve accuracy for a large number of genes. Thanks to a penalized linear model, we also show that the number of features used in our models remains comparable to the cis-only models. Finally, SLINGER application on seven Wellcome Trust Case-Control Consortium genome-wide association studies demonstrate that compared to a cis-only approach, our models lead to associations with greater fidelity to actual gene expression values.
Human-to-human transmission of symbiotic, anaerobic bacteria is a fundamental evolutionary adaptation essential for membership of the human gut microbiota. However, despite its importance, the ...genomic and biological adaptations underpinning symbiont transmission remain poorly understood. The Firmicutes are a dominant phylum within the intestinal microbiota that are capable of producing resistant endospores that maintain viability within the environment and germinate within the intestine to facilitate transmission. However, the impact of host transmission on the evolutionary and adaptive processes within the intestinal microbiota remains unknown.
We analyze 1358 genomes of Firmicutes bacteria derived from host and environment-associated habitats. Characterization of genomes as spore-forming based on the presence of sporulation-predictive genes reveals multiple losses of sporulation in many distinct lineages. Loss of sporulation in gut Firmicutes is associated with features of host-adaptation such as genome reduction and specialized metabolic capabilities. Consistent with these data, analysis of 9966 gut metagenomes from adults around the world demonstrates that bacteria now incapable of sporulation are more abundant within individuals but less prevalent in the human population compared to spore-forming bacteria.
Our results suggest host adaptation in gut Firmicutes is an evolutionary trade-off between transmission range and colonization abundance. We reveal host transmission as an underappreciated process that shapes the evolution, assembly, and functions of gut Firmicutes.
The gut microbiome is implicated as a marker of response to immune checkpoint inhibitors (ICI) based on preclinical mouse models and preliminary observations in limited patient series. Furthermore, ...early studies suggest faecal microbial transfer may have therapeutic potential, converting ICI non-responders into responders. So far, identification of specific responsible bacterial taxa has been inconsistent, which limits future application. The MITRE study will explore and validate a microbiome signature in a larger scale prospective study across several different cancer types.
Melanoma, renal cancer and non-small cell lung cancer patients who are planned to receive standard immune checkpoint inhibitors are being recruited to the MITRE study. Longitudinal stool samples are collected prior to treatment, then at 6 weeks, 3, 6 and 12 months during treatment, or at disease progression/recurrence (whichever is sooner), as well as after a severe (≥grade 3 CTCAE v5.0) immune-related adverse event. Additionally, whole blood, plasma, buffy coat, RNA and peripheral blood mononuclear cells (PBMCs) is collected at similar time points and will be used for exploratory analyses. Archival tumour tissue, tumour biopsies at progression/relapse, as well as any biopsies from body organs collected after a severe toxicity are collected. The primary outcome measure is the ability of the microbiome signature to predict 1 year progression-free survival (PFS) in patients with advanced disease. Secondary outcomes include microbiome correlations with toxicity and other efficacy end-points. Biosamples will be used to explore immunological and genomic correlates. A sub-study will evaluate both COVID-19 antigen and antibody associations with the microbiome.
There is an urgent need to identify biomarkers that are predictive of treatment response, resistance and toxicity to immunotherapy. The data generated from this study will both help inform patient selection for these drugs and provide information that may allow therapeutic manipulation of the microbiome to improve future patient outcomes.
NCT04107168 , ClinicalTrials.gov, registered 09/27/2019. Protocol V3.2 (16/04/2021).
Neurodevelopmental disorders (NDDs) such as autism spectrum disorder (ASD) display a strong male bias. Androgen exposure is profoundly increased in typical male development, but it also varies within ...the sexes, and previous work has sought to connect morphological proxies of androgen exposure, including digit ratio and facial morphology, to neurodevelopmental outcomes. The results of these studies have been mixed, and the relationships between androgen exposure and behavior remain unclear.
Here, we measured both digit ratio masculinity (DRM) and facial landmark masculinity (FLM) in the same neurodevelopmental cohort (N = 763) and compared these proxies of androgen exposure to clinical and parent-reported features as well as polygenic risk scores.
We found that FLM was significantly associated with NDD diagnosis (ASD, ADHD, ID; all Formula: see text), while DRM was not. When testing for association with parent-reported problems, we found that both FLM and DRM were positively associated with concerns about social behavior (Formula: see text, Formula: see text; Formula: see text, Formula: see text, respectively). Furthermore, we found evidence via polygenic risk scores (PRS) that DRM indexes masculinity via testosterone levels (Formula: see text, Formula: see text), while FLM indexes masculinity through a negative relationship with sex hormone binding globulin (SHBG) levels (Formula: see text, Formula: see text). Finally, using the SPARK cohort (N = 9419) we replicated the observed relationship between polygenic estimates of testosterone, SHBG, and social functioning (Formula: see text, Formula: see text, and Formula: see text, Formula: see text for testosterone and SHBG, respectively). Remarkably, when considered over the extremes of each variable, these quantitative sex effects on social functioning were comparable to the effect of binary sex itself (binary male: Formula: see text; testosterone: Formula: see text from 0.1%-ile to 99.9%-ile; SHBG: Formula: see text from 0.1%-ile to 99.9%-ile).
In the devGenes and SPARK cohorts, our analyses rely on indirect, rather than direct measurement of androgens and related molecules.
These findings and their replication in the large SPARK cohort lend support to the hypothesis that increasing net androgen exposure diminishes capacity for social functioning in both males and females.
Alcohol‐related liver disease is a major public health burden, and the gut microbiota is an important contributor to disease pathogenesis. The aim of the present study is to characterize functional ...alterations of the gut microbiota and test their performance for short‐term mortality prediction in patients with alcoholic hepatitis. We integrated shotgun metagenomics with untargeted metabolomics to investigate functional alterations of the gut microbiota and host co‐metabolism in a multicenter cohort of patients with alcoholic hepatitis. Profound changes were found in the gut microbial composition, functional metagenome, serum, and fecal metabolomes in patients with alcoholic hepatitis compared with nonalcoholic controls. We demonstrate that in comparison with single omics alone, the performance to predict 30‐day mortality was improved when combining microbial pathways with respective serum metabolites in patients with alcoholic hepatitis. The area under the receiver operating curve was higher than 0.85 for the tryptophan, isoleucine, and methionine pathways as predictors for 30‐day mortality, but achieved 0.989 for using the urea cycle pathway in combination with serum urea, with a bias‐corrected prediction error of 0.083 when using leave‐one‐out cross validation. Conclusion: Our study reveals changes in key microbial metabolic pathways associated with disease severity that predict short‐term mortality in our cohort of patients with alcoholic hepatitis.
Functional characterization for genetic variants is a major challenge in whole-genome sequencing-based studies. Recent approaches, such as TiSAn (Vervier et al., 2017) or GenoSkyline (Lu et al., ...2016), estimate tissue-specific impact of variations, in particular in human brain tissues. However, such annotations do not provide insights on which brain regions or development time points might be especially vulnerable to a given variant.
In this work, we propose to integrate spatiotemporal gene expression from BrainSpan (BrainSpan, 2014) to estimate what the 'context matrix' of a variant is. Variants found in non-coding regions are represented as a combination of gene expression matrices, where weights are based on associations demonstrated in SLINGER models (Vervier et al., 2016).
We validate our approach on psychiatric disorder datasets, and use temporal patterns to discriminate early- and late-onset damaging variants. Brain region-specific variants also help to identify combined mechanisms of action, in complex traits.
Spatio-temporal profiles could also be combined with polygenic risk score approaches, and provide new dimensions to study psychiatric disorders.
Immediately after birth, newborn babies experience rapid colonization by microorganisms from their mothers and the surrounding environment
. Diseases in childhood and later in life are potentially ...mediated by the perturbation of the colonization of the infant gut microbiota
. However, the effects of delivery via caesarean section on the earliest stages of the acquisition and development of the gut microbiota, during the neonatal period (≤1 month), remain controversial
. Here we report the disrupted transmission of maternal Bacteroides strains, and high-level colonization by opportunistic pathogens associated with the hospital environment (including Enterococcus, Enterobacter and Klebsiella species), in babies delivered by caesarean section. These effects were also seen, to a lesser extent, in vaginally delivered babies whose mothers underwent antibiotic prophylaxis and in babies who were not breastfed during the neonatal period. We applied longitudinal sampling and whole-genome shotgun metagenomic analysis to 1,679 gut microbiota samples (taken at several time points during the neonatal period, and in infancy) from 596 full-term babies born in UK hospitals; for a subset of these babies, we collected additional matched samples from mothers (175 mothers paired with 178 babies). This analysis demonstrates that the mode of delivery is a significant factor that affects the composition of the gut microbiota throughout the neonatal period, and into infancy. Matched large-scale culturing and whole-genome sequencing of over 800 bacterial strains from these babies identified virulence factors and clinically relevant antimicrobial resistance in opportunistic pathogens that may predispose individuals to opportunistic infections. Our findings highlight the critical role of the local environment in establishing the gut microbiota in very early life, and identify colonization with antimicrobial-resistance-containing opportunistic pathogens as a previously underappreciated risk factor in hospital births.