Genes are pleiotropic and getting a better knowledge of their function requires a comprehensive characterization of their mutants. Here, we generated multi-level data combining phenomic, proteomic ...and metabolomic acquisitions from plasma and liver tissues of two C57BL/6 N mouse models lacking the Lat (linker for activation of T cells) and the Mx2 (MX dynamin-like GTPase 2) genes, respectively. Our dataset consists of 9 assays (1 preclinical, 2 proteomics and 6 metabolomics) generated with a fully non-targeted and standardized approach. The data and processing code are publicly available in the ProMetIS R package to ensure accessibility, interoperability, and reusability. The dataset thus provides unique molecular information about the physiological role of the Lat and Mx2 genes. Furthermore, the protocols described herein can be easily extended to a larger number of individuals and tissues. Finally, this resource will be of great interest to develop new bioinformatic and biostatistic methods for multi-omics data integration.
Adipose tissue (AT) transcriptome studies provide holistic pictures of adaptation to weight and related bioclinical settings changes.
To implement AT gene expression profiling and investigate the ...link between changes in bioclinical parameters and AT gene expression during 3 steps of a 2-phase dietary intervention (DI).
AT transcriptome profiling was obtained from sequencing 1051 samples, corresponding to 556 distinct individuals enrolled in a weight loss intervention (8-week low-calorie diet (LCD) at 800 kcal/day) followed with a 6-month ad libitum randomized DI. Transcriptome profiles obtained with QuantSeq sequencing were benchmarked against Illumina RNAseq. Reverse transcription quantitative polymerase chain reaction was used to further confirm associations. Cell specificity was assessed using freshly isolated cells and THP-1 cell line.
During LCD, 5 modules were found, of which 3 included at least 1 bioclinical variable. Change in body mass index (BMI) connected with changes in mRNA level of genes with inflammatory response signature. In this module, change in BMI was negatively associated with changes in expression of genes encoding secreted protein (GDF15, CCL3, and SPP1). Through all phases of the DI, change in GDF15 was connected to changes in SPP1, CCL3, LIPA and CD68. Further characterization showed that these genes were specific to macrophages (with LIPA, CD68 and GDF15 expressed in anti-inflammatory macrophages) and GDF15 also expressed in preadipocytes.
Network analyses identified a novel AT feature with GDF15 upregulated with calorie restriction induced weight loss, concomitantly to macrophage markers. In AT, GDF15 was expressed in preadipocytes and macrophages where it was a hallmark of anti-inflammatory cells.
Les avancées des nouvelles technologies de séquençage ont permis aux études cliniques de produire des données volumineuses et complexes. Cette complexité se décline selon diverses modalités, ...notamment la grande dimension, l’hétérogénéité des données au niveau biologique (acquises à différents niveaux de l’échelle du vivant et à divers moments de l’expérience), l’hétérogénéité du type de données, le bruit (hétérogénéité biologique ou données entachées d’erreurs) dans les données et la présence de données manquantes (au niveau d’une valeur ou d’un individu entier). L’intégration de différentes données est donc un défi important pour la biologie computationnelle. Cette thèse s’inscrit dans un projet de recherche clinique sur l’obésité, DiOGenes, pour lequel nous avons fait des propositions méthodologiques pour l’analyse et l’intégration de données. Ce projet est basé sur une intervention nutritionnelle menée dans huit pays européens et vise à analyser les effets de différents régimes sur le maintien pondéral et sur certains marqueurs de risque cardio-vasculaire et de diabète, chez des individus obèses. Dans le cadre de ce projet, mes travaux ont porté sur l’analyse de données transcriptomiques (RNA-Seq) avec des individus manquants et sur l’intégration de données transcriptomiques (nouvelle technique QuantSeq) avec des données cliniques. La première partie de cette thèse est consacrée aux données manquantes et à l’inférence de réseaux à partir de données d’expression RNA-Seq. Lors d’études longitudinales transcriptomiques, il arrive que certains individus ne soient pas observés à certains pas de temps, pour des raisons expérimentales. Nous proposons une méthode d’imputation multiple hot-deck (hd-MI) qui permet d’intégrer de l’information externe mesurée sur les mêmes individus et d’autres individus. hd-MI permet d’améliorer la qualité de l’inférence de réseau. La seconde partie porte sur une étude intégrative de données cliniques et transcriptomiques (mesurées par QuantSeq) basée sur une approche réseau. Nous y montrons l’intérêt de cette nouvelle technique pour l’acquisition de données transcriptomiques et l’analysons par une approche d’inférence de réseau en lien avec des données cliniques d’intérêt.
The development of high-throughput sequencing technologies has lead to a massive acquisition of high dimensional and complex datasets. Different features make these datasets hard to analyze : high dimensionality, heterogeneity at the biological level or at the data type level, the noise in data (due to biological heterogeneity or to errors in data) and the presence of missing data (for given values or for an entire individual). The integration of various data is thus an important challenge for computational biology. This thesis is part of a large clinical research project on obesity, DiOGenes, in which we have developed methods for data analysis and integration. The project is based on a dietary intervention that was led in eight Europeans centers. This study investigated the effect of macronutrient composition on weight-loss maintenance and metabolic and cardiovascular risk factors after a phase of calorie restriction in obese individuals. My work have mainly focused on transcriptomic data analysis (RNA-Seq) with missing individuals and data integration of transcriptomic (new QuantSeq protocol) and clinic datasets. The first part is focused on missing data and network inference from RNA-Seq datasets. During longitudinal study, some observations are missing for some time step. In order to take advantage of external information measured simultaneously to RNA-Seq data, we propose an imputation method, hot-deck multiple imputation (hd-MI), that improves the reliability of network inference. The second part deals with an integrative study of clinical data and transcriptomic data, measured by QuantSeq, based on a network approach. The new protocol is shown efficient for transcriptome measurement. We proposed an analysis based on network inference that is linked to clinical variables of interest.
Abstract
Motivation
Network inference provides a global view of the relations existing between gene expression in a given transcriptomic experiment (often only for a restricted list of chosen genes). ...However, it is still a challenging problem: even if the cost of sequencing techniques has decreased over the last years, the number of samples in a given experiment is still (very) small compared to the number of genes.
Results
We propose a method to increase the reliability of the inference when RNA-seq expression data have been measured together with an auxiliary dataset that can provide external information on gene expression similarity between samples. Our statistical approach, hd-MI, is based on imputation for samples without available RNA-seq data that are considered as missing data but are observed on the secondary dataset. hd-MI can improve the reliability of the inference for missing rates up to 30% and provides more stable networks with a smaller number of false positive edges. On a biological point of view, hd-MI was also found relevant to infer networks from RNA-seq data acquired in adipose tissue during a nutritional intervention in obese individuals. In these networks, novel links between genes were highlighted, as well as an improved comparability between the two steps of the nutritional intervention.
Availability and implementation
Software and sample data are available as an R package, RNAseqNet, that can be downloaded from the Comprehensive R Archive Network (CRAN).
Supplementary information
Supplementary data are available at Bioinformatics online.
Adipose tissue (AT) transcriptome studies provide holistic pictures of adaptation to weight and related bioclinical settings changes. Objective To implement AT gene expression profiling and ...investigate the link between changes in bioclinical parameters and AT gene expression during 3 steps of a 2-phase dietary intervention (DI). Methods AT transcriptome profiling was obtained from sequencing 1051 samples, corresponding to 556 distinct individuals enrolled in a weight loss intervention (8-week low-calorie diet (LCD) at 800 kcal/day) followed with a 6-month ad libitum randomized DI. Transcriptome profiles obtained with QuantSeq sequencing were benchmarked against Illumina RNAseq. Reverse transcription quantitative polymerase chain reaction was used to further confirm associations. Cell specificity was assessed using freshly isolated cells and THP-1 cell line. Results During LCD, 5 modules were found, of which 3 included at least 1 bioclinical variable. Change in body mass index (BMI) connected with changes in mRNA level of genes with inflammatory response signature. In this module, change in BMI was negatively associated with changes in expression of genes encoding secreted protein (GDF15, CCL3, and SPP1). Through all phases of the DI, change in GDF15 was connected to changes in SPP1, CCL3, LIPA and CD68. Further characterization showed that these genes were specific to macrophages (with LIPA, CD68 and GDF15 expressed in anti-inflammatory macrophages) and GDF15 also expressed in preadipocytes. Conclusion Network analyses identified a novel AT feature with GDF15 upregulated with calorie restriction induced weight loss, concomitantly to macrophage markers. In AT, GDF15 was expressed in preadipocytes and macrophages where it was a hallmark of anti-inflammatory cells.