The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such ...large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.
Objective
HLA alleles affect susceptibility to more than 100 diseases, but the mechanisms that account for these genotype–disease associations are largely unknown. HLA alleles strongly influence ...predisposition to ankylosing spondylitis (AS) and rheumatoid arthritis (RA). Both AS and RA patients have discrete intestinal and fecal microbiome signatures. Whether these changes are the cause or consequence of the diseases themselves is unclear. To distinguish these possibilities, we examined the effect of HLA–B27 and HLA–DRB1 RA risk alleles on the composition of the intestinal microbiome in healthy individuals.
Methods
Five hundred sixty‐eight stool and biopsy samples from 6 intestinal sites were collected from 107 healthy unrelated subjects, and stool samples were collected from 696 twin pairs from the TwinsUK cohort. Microbiome profiling was performed using sequencing of the 16S ribosomal RNA bacterial marker gene. All subjects were genotyped using the Illumina CoreExome SNP microarray, and HLA genotypes were imputed from these data.
Results
Associations were observed between the overall microbial composition and both the HLA–B27 genotype and the HLA–DRB1 RA risk allele (P = 0.0002 and P = 0.00001, respectively). These associations were replicated using the stool samples from the TwinsUK cohort (P = 0.023 and P = 0.033, respectively).
Conclusion
This study shows that the changes in intestinal microbiome composition seen in AS and RA are at least partially due to effects of HLA‐B27 and HLA–DRB1 on the gut microbiome. These findings support the hypothesis that HLA alleles operate to cause or increase the risk of these diseases through interaction with the intestinal microbiome and suggest that therapies targeting the microbiome may be effective in preventing or treating these diseases.
Abstract
Microbial communities have been increasingly studied in recent years to investigate their role in ecological habitats. However, microbiome studies are difficult to reproduce or replicate as ...they may suffer from confounding factors that are unavoidable in practice and originate from biological, technical or computational sources. In this review, we define batch effects as unwanted variation introduced by confounding factors that are not related to any factors of interest. Computational and analytical methods are required to remove or account for batch effects. However, inherent microbiome data characteristics (e.g. sparse, compositional and multivariate) challenge the development and application of batch effect adjustment methods to either account or correct for batch effects. We present commonly encountered sources of batch effects that we illustrate in several case studies. We discuss the limitations of current methods, which often have assumptions that are not met due to the peculiarities of microbiome data. We provide practical guidelines for assessing the efficiency of the methods based on visual and numerical outputs and a thorough tutorial to reproduce the analyses conducted in this review.
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better ...characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.
A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.
sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data ...Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups.
Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites.
DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette.
Supplementary data are available at Bioinformatics online.
Dysbiosis of the gut microbiota has been linked to disease pathogenesis in type 1 diabetes, yet the functional consequences to the host of this dysbiosis are unknown. We investigated the functional ...interactions between the microbiota and the host associated with type 1 diabetes disease risk.
We performed a cross-sectional analysis of stool samples from subjects with recent-onset type 1 diabetes (
= 33), islet autoantibody-positive subjects (
= 17), low-risk autoantibody-negative subjects (
= 29), and healthy subjects (
= 22). Metaproteomic analysis was used to identify gut- and pancreas-derived host and microbial proteins, and these data were integrated with sequencing-based microbiota profiling.
Both human (host-derived) proteins and microbial-derived proteins could be used to differentiate new-onset and islet autoantibody-positive subjects from low-risk subjects. Significant alterations were identified in the prevalence of host proteins associated with exocrine pancreas output, inflammation, and mucosal function. Integrative analysis showed that microbial taxa associated with host proteins involved in maintaining function of the mucous barrier, microvilli adhesion, and exocrine pancreas were depleted in patients with new-onset type 1 diabetes.
These data support that patients with type 1 diabetes have increased intestinal inflammation and decreased barrier function. They also confirmed that pancreatic exocrine dysfunction occurs in new-onset type 1 diabetes and show for the first time that this dysfunction is present in high-risk individuals before disease onset. The data identify a unique type 1 diabetes-associated signature in stool that may be useful as a means to monitor disease progression or response to therapies aimed at restoring a healthy microbiota.
Motivation: With the availability of many ‘omics’ data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms ...is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated. Results: integrOmics efficiently performs integrative analyses of two types of ‘omics’ variables that are measured on the same samples. It includes a regularized version of canonical correlation analysis to enlighten correlations between two datasets, and a sparse version of partial least squares (PLS) regression that includes simultaneous variable selection in both datasets. The usefulness of both approaches has been demonstrated previously and successfully applied in various integrative studies. Availability: integrOmics is freely available from http://CRAN.R-project.org/ or from the web site companion (http://math.univ-toulouse.fr/biostat) that provides full documentation and tutorials. Contact: k.lecao@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Abstract
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can ...capture temporal variation within the microbiome to gain mechanistic insights into microbial systems; however, current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time and (3) network modelling to identify temporal relationships between microorganisms. This review explores the strengths and limitations of current methods to fulfill these objectives, compares different methods in simulation and case studies for objectives (1) and (2), and highlights opportunities for further methodological developments. R tutorials are provided to reproduce the analyses conducted in this review.
Global environments are threatened by intensively natural variation and continuously increased human-made disturbances. Assessment of the global eco-environment vulnerability (global EV or GEV) ...caused by both natural and human-induced disturbances plays a key role in providing valuable information about ecological and environmental background for designing suitable policy measures to improve and restore environment. We present the first global-scale map of quantified eco-environmental vulnerability by integrating remote sensing, GIS modelling, and global census datasets, employing 16 influential factors across five domains: socioeconomics, land resources, natural hazards, hydrometeorology, and topography. The GEV is classified into six levels, namely very low vulnerability, low vulnerability, medium vulnerability, medium high vulnerability, high vulnerability, and very high vulnerability. At global scale, a small fraction of the globe (10.1%) is strongly (high and very high vulnerability) affected by influential factors. Among continents, the largest fraction of very high vulnerability level is attributed to Asia (74.6%) followed by Africa (19.6%). National-scale analysis shows that China and India are the most vulnerable in Asia and in the world. Our study provides accumulative impacts of manmade and natural disturbances, which are vital for decision makers to set improvement targets on specific areas over local, regional, and global scales, and design and adopt new practices to lessen natural and manmade disturbances on environment, while keeping track of evolution of the other environmental aspects.
Display omitted
•A global eco-environmental vulnerability map is generated by using a proposed assessment framework and essential datasets.•Significant eco-environmental vulnerability levels are widely attributed in Asia and Africa (China, India and Ethiopia).•Natural hazards and anthropogenic stress pose a threat on eco-environment and enhanced by climate change.•Advised eco-environmental protection zones provide critical information for environmental management and conservation.
Hepatocellular carcinomas (HCCs) exhibit a diversity of molecular phenotypes, raising major challenges in clinical management. HCCs detected by surveillance programs at an early stage are candidates ...for potentially curative therapies (local ablation, resection, or transplantation). In the long term, transplantation provides the lowest recurrence rates. Treatment allocation is based on tumor number, size, vascular invasion, performance status, functional liver reserve, and the prediction of early (<2 years) recurrence, which reflects the intrinsic aggressiveness of the tumor. Well‐differentiated, potentially low‐aggressiveness tumors form the heterogeneous molecular class of nonproliferative HCCs, characterized by an approximate 50% β‐catenin mutation rate. To define the clinical, pathological, and molecular features and the outcome of nonproliferative HCCs, we constructed a 1,133‐HCC transcriptomic metadata set and validated findings in a publically available 210‐HCC RNA sequencing set. We show that nonproliferative HCCs preserve the zonation program that distributes metabolic functions along the portocentral axis in normal liver. More precisely, we identified two well‐differentiated, nonproliferation subclasses, namely periportal‐type (wild‐type β‐catenin) and perivenous‐type (mutant β‐catenin), which expressed negatively correlated gene networks. The new periportal‐type subclass represented 29% of all HCCs; expressed a hepatocyte nuclear factor 4A–driven gene network, which was down‐regulated in mouse hepatocyte nuclear factor 4A knockout mice; were early‐stage tumors by Barcelona Clinic Liver Cancer, Cancer of the Liver Italian Program, and tumor–node–metastasis staging systems; had no macrovascular invasion; and showed the lowest metastasis‐specific gene expression levels and TP53 mutation rates. Also, we identified an eight‐gene periportal‐type HCC signature, which was independently associated with the highest 2‐year recurrence‐free survival by multivariate analyses in two independent cohorts of 247 and 210 patients. Conclusion: Well‐differentiated HCCs display mutually exclusive periportal or perivenous zonation programs. Among all HCCs, periportal‐type tumors have the lowest intrinsic potential for early recurrence after curative resection. (Hepatology 2017;66:1502–1518).