For many complex diseases, an earlier and more reliable diagnosis is considered a key prerequisite for developing more effective therapies to prevent or delay disease progression. Classical ...statistical learning approaches for specimen classification using omics data, however, often cannot provide diagnostic models with sufficient accuracy and robustness for heterogeneous diseases like cancers or neurodegenerative disorders. In recent years, new approaches for building multivariate biomarker models on omics data have been proposed, which exploit prior biological knowledge from molecular networks and cellular pathways to address these limitations. This survey provides an overview of these recent developments and compares pathway- and network-based specimen classification approaches in terms of their utility for improving model robustness, accuracy and biological interpretability. Different routes to translate omics-based multifactorial biomarker models into clinical diagnostic tests are discussed, and a previous study is presented as example.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract Aging is considered as one of the main factors promoting the risk for Parkinson's disease (PD), and common mechanisms of dopamine neuron degeneration in aging and PD have been proposed in ...recent years. Here, we use a statistical meta-analysis of human brain transcriptomics data to investigate potential mechanistic relationships between adult brain aging and PD pathogenesis at the pathway and network level. The analyses identify statistically significant shared pathway and network alterations in aging and PD and an enrichment in PD-associated sequence variants from genome-wide association studies among the jointly deregulated genes. We find robust discriminative patterns for groups of functionally related genes with potential applications as combined risk biomarkers to detect aging- and PD-linked oxidative stress, e.g., a consistent over-expression of metallothioneins matching with findings in previous independent studies. Interestingly, analyzing the regulatory network and mouse knockout expression data for NR4A2, a transcription factor previously associated with rare mutations in PD and here found as the most significantly under-expressed gene in PD among the jointly altered genes, suggests that aging-related NR4A2 expression changes may increase PD risk via downstream effects similar to disease-linked mutations and to expression changes in sporadic PD. Overall, the analyses suggest mechanistic explanations for the age-dependence of PD risk and reveal significant and robust shared process alterations with potential applications in biomarker development for pre-symptomatic risk assessment or early stage diagnosis.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Changes in the human gastrointestinal microbiome are associated with several diseases. To infer causality, experiments in representative models are essential, but widely used animal models exhibit ...limitations. Here we present a modular, microfluidics-based model (HuMiX, human-microbial crosstalk), which allows co-culture of human and microbial cells under conditions representative of the gastrointestinal human-microbe interface. We demonstrate the ability of HuMiX to recapitulate in vivo transcriptional, metabolic and immunological responses in human intestinal epithelial cells following their co-culture with the commensal Lactobacillus rhamnosus GG (LGG) grown under anaerobic conditions. In addition, we show that the co-culture of human epithelial cells with the obligate anaerobe Bacteroides caccae and LGG results in a transcriptional response, which is distinct from that of a co-culture solely comprising LGG. HuMiX facilitates investigations of host-microbe molecular interactions and provides insights into a range of fundamental research questions linking the gastrointestinal microbiome to human health and disease.
Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale ...functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of interest and (iv) tissue-specific gene/protein set associations cannot be recognized.
To address these limitations, we introduce an integrative analysis approach and web-application called EnrichNet. It combines a novel graph-based statistic with an interactive sub-network visualization to accomplish two complementary goals: improving the prioritization of putative functional gene/protein set associations by exploiting information from molecular interaction networks and tissue-specific gene expression data and enabling a direct biological interpretation of the results. By using the approach to analyse sets of genes with known involvement in human diseases, new pathway associations are identified, reflecting a dense sub-network of interactions between their corresponding proteins.
EnrichNet is freely available at http://www.enrichnet.org.
Natalio.Krasnogor@nottingham.ac.uk, reinhard.schneider@uni.lu or avalencia@cnio.es
Supplementary data are available at Bioinformatics Online.
Apart from the definition of the specific scope, objectives, and milestones, this also includes the choice of relevant experimental conditions to study (diseases/subtypes/treatments) or prior data to ...include (e.g., existing clinical and health record data), the selection of a suitable tissue pool/cell type(s) and measurement platform, the biological sampling design (i.e., how the samples will be collected, if not already available), the blocking design 2, and the measurement design (i.e., the arrangement of samples in the measurement instrument and across different measurement batches 3). ...to ensure that the study is adequately powered and that biospecimen resources are used efficiently, dedicated sample size determination methods 4 and sample selection and matching methods (e.g., for confounder matching between cases and controls) 5 should be applied. ...a comprehensive and clear documentation of the study design is essential for effective project monitoring. Current data analytical methods have only a limited ability to discriminate between them. ...quality control and filtering analyses, data curation, annotation, and standardization are important initial steps in biomedical data processing pipelines. ...as part of the data curation and standardization, it is recommendable to compare and evaluate multiple options to define primary and secondary study endpoints and other key input and outcome variables (e.g., comparing different definitions of tumor grades or disease stages or different disease ontologies 32).
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
ObjectiveTo review biomarker discovery studies using omics data for patient stratification which led to clinically validated FDA-cleared tests or laboratory developed tests, in order to identify ...common characteristics and derive recommendations for future biomarker projects.DesignScoping review.MethodsWe searched PubMed, EMBASE and Web of Science to obtain a comprehensive list of articles from the biomedical literature published between January 2000 and July 2021, describing clinically validated biomarker signatures for patient stratification, derived using statistical learning approaches. All documents were screened to retain only peer-reviewed research articles, review articles or opinion articles, covering supervised and unsupervised machine learning applications for omics-based patient stratification. Two reviewers independently confirmed the eligibility. Disagreements were solved by consensus. We focused the final analysis on omics-based biomarkers which achieved the highest level of validation, that is, clinical approval of the developed molecular signature as a laboratory developed test or FDA approved tests.ResultsOverall, 352 articles fulfilled the eligibility criteria. The analysis of validated biomarker signatures identified multiple common methodological and practical features that may explain the successful test development and guide future biomarker projects. These include study design choices to ensure sufficient statistical power for model building and external testing, suitable combinations of non-targeted and targeted measurement technologies, the integration of prior biological knowledge, strict filtering and inclusion/exclusion criteria, and the adequacy of statistical and machine learning methods for discovery and validation.ConclusionsWhile most clinically validated biomarker models derived from omics data have been developed for personalised oncology, first applications for non-cancer diseases show the potential of multivariate omics biomarker design for other complex disorders. Distinctive characteristics of prior success stories, such as early filtering and robust discovery approaches, continuous improvements in assay design and experimental measurement technology, and rigorous multicohort validation approaches, enable the derivation of specific recommendations for future studies.
Parkinson’s disease (PD) exhibits systemic effects on the human metabolism, with emerging roles for the gut microbiome. Here, we integrate longitudinal metabolome data from 30 drug-naive, de novo ...PD patients and 30 matched controls with constraint-based modeling of gut microbial communities derived from an independent, drug-naive PD cohort, and prospective data from the general population. Our key results are (1) longitudinal trajectory of metabolites associated with the interconversion of methionine and cysteine via cystathionine differed between PD patients and controls; (2) dopaminergic medication showed strong lipidomic signatures; (3) taurine-conjugated bile acids correlated with the severity of motor symptoms, while low levels of sulfated taurolithocholate were associated with PD incidence in the general population; and (4) computational modeling predicted changes in sulfur metabolism, driven by A. muciniphila and B. wadsworthia, which is consistent with the changed metabolome. The multi-omics integration reveals PD-specific patterns in microbial-host sulfur co-metabolism that may contribute to PD severity.
Display omitted
•Longitudinal metabolomics reveal disturbed transsulfuration in Parkinson’s disease•Metabolic modeling of gut microbiomes show altered microbial sulfur metabolism•Changed microbial sulfur metabolism is linked to B. wadsworthia and A. muciniphila•Taurine-conjugated bile acids are associated with incident Parkinson’s disease
Hertel et al. demonstrate complex alterations in human and microbial sulfur metabolism in Parkinson’s disease by integrating longitudinal metabolomics and computational modeling of gut microbiomes. Then, potential clinical importance is revealed as secondary taurine-conjugated bile acids are shown to be associated with disease severity and Parkinson’s disease incidence.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find ...informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The diagnosis of Parkinson's disease (PD) often remains a clinical challenge. Molecular neuroimaging can facilitate the diagnostic process. The diagnostic potential of metabolomic signatures has ...recently been recognized.
We investigated whether the joint data analysis of blood metabolomics and PET imaging by machine learning provides enhanced diagnostic discrimination and gives further pathophysiological insights. Blood plasma samples were collected from 60 PD patients and 15 age- and gender-matched healthy controls. We determined metabolomic profiles by gas chromatography coupled to mass spectrometry (GC–MS). In the same cohort and at the same time we performed FDOPA PET in 44 patients and 14 controls and FDG PET in 51 patients and 16 controls. 18 PD patients were available for a follow-up exam after one year. Both data sets were analysed by two machine learning approaches, applying either linear support vector machines or random forests within a leave-one-out cross-validation scheme and computing receiver operating characteristic (ROC) curves.
In the metabolomics data, the baseline comparison between cases and controls as well as the follow-up assessment of patients pointed to metabolite changes associated with oxidative stress and inflammation. For the FDOPA and FDG PET data, the diagnostic predictive performance (DPP) in the ROC analyses was highest when combining imaging features with metabolomics data (ROC AUC for best FDOPA + metabolomics model: 0.98; AUC for best FDG + metabolomics model: 0.91). DPP was lower when using only PET attributes or only metabolomics signatures.
Integrating blood metabolomics data combined with PET data considerably enhances the diagnostic discrimination power. Metabolomic signatures also indicate interesting disease-inherent changes in cellular processes, including oxidative stress response and inflammation.
Display omitted
•Metabolomics and PET imaging reveal multifaceted changes in Parkinson's disease (PD)•Combining the two data sources increases the diagnostic discrimination power for PD•Metabolite alterations in PD are associated with oxidative stress and inflammation
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Seed germination is a complex trait of key ecological and agronomic significance. Few genetic factors regulating germination have been identified, and the means by which their concerted action ...controls this developmental process remains largely unknown. Using publicly available gene expression data from Arabidopsis thaliana, we generated a condition-dependent network model of global transcriptional interactions (SeedNet) that shows evidence of evolutionary conservation in flowering plants. The topology of the SeedNet graph reflects the biological process, including two state-dependent sets of interactions associated with dormancy or germination. SeedNet highlights interactions between known regulators of this process and predicts the germination-associated function of uncharacterized hub nodes connected to them with 50% accuracy. An intermediate transition region between the dormancy and germination subdomains is enriched with genes involved in cellular phase transitions. The phase transition regulators SERRATE and EARLY FLOWERING IN SHORT DAYS from this region affect seed germination, indicating that conserved mechanisms control transitions in cell identity in plants. The SeedNet dormancy region is strongly associated with vegetative abiotic stress response genes. These data suggest that seed dormancy, an adaptive trait that arose evolutionarily late, evolved by coopting existing genetic pathways regulating cellular phase transition and abiotic stress. SeedNet is available as a community resource (http://vseed.nottingham.ac.uk) to aid dissection of this complex trait and gene function in diverse processes.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK