The choice of the most appropriate unsupervised machine-learning method for "heterogeneous" or "mixed" data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to ...examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of "ready-to-use" tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model LCM and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Patients with heart failure (HF) and coronary artery disease (CAD) have a high risk for cardiovascular (CV) events including HF hospitalization, stroke, myocardial infarction (MI) and sudden cardiac ...death (SCD). The present study evaluated associations of proteomic biomarkers with CV outcome in patients with CAD and HF with reduced ejection fraction (HFrEF), shortly after a worsening HF episode. We performed a case-control study within the COMMANDER HF international, double-blind, randomized placebo-controlled trial investigating the effects of the factor-Xa inhibitor rivaroxaban. Patients with the following first clinical events: HF hospitalization, SCD and the composite of MI or stroke were matched with corresponding controls for age, sex and study drug. Plasma concentrations of 276 proteins with known associations with CV and cardiometabolic mechanisms were analyzed. Results were corrected for multiple testing using false discovery rate (FDR). In 485 cases and 455 controls, 49 proteins were significantly associated with clinical events of which seven had an adjusted FDR < 0.001 (NT-proBNP, BNP, T-cell immunoglobulin and mucin domain containing 4 (TIMD4), fibroblast growth factor 23 (FGF-23), growth differentiation factor-15 (GDF-15), pulmonary surfactant-associated protein D (PSP-D) and Spondin-1 (SPON1)). No significant interactions were identified between the type of clinical event (MI/stroke, SCD or HFH) and specific biomarkers (all interaction FDR > 0.20). When adding the biomarkers significantly associated with the above outcome to a clinical model (including NT-proBNP), the C-index increase was 0.057 (0.033-0.082), p < 0.0001 and the net reclassification index was 54.9 (42.5 to 67.3), p < 0.0001. In patients with HFrEF and CAD following HF hospitalization, we found that NT-proBNP, BNP, TIMD4, FGF-23, GDF-15, PSP-D and SPON1, biomarkers broadly associated with inflammation and remodeling mechanistic pathways, were strong but indiscriminate predictors of a variety of individual CV events.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Cereal crops are frequently affected by toxigenic
species, among which the most common and worrying in Europe are
and
. These species are the causal agents of grain contamination with type B ...trichothecene (TCTB) mycotoxins. To help reduce the use of synthetic fungicides while guaranteeing low mycotoxin levels, there is an urgent need to develop new, efficient and environmentally-friendly plant protection solutions. Previously,
proteins that could serve as putative targets to block the fungal spread and toxin production were identified and a virtual screening undertaken. Here, two selected compounds, M1 and M2, predicted, respectively, as the top compounds acting on the trichodiene synthase, a key enzyme of TCTB biosynthesis, and the 24-sterol-C-methyltransferase, a protein involved in ergosterol biosynthesis, were submitted for biological tests. Corroborating in silico predictions, M1 was shown to significantly inhibit TCTB yield by a panel of strains. Results were less obvious with M2 that induced only a slight reduction in fungal biomass. To go further, seven M1 analogs were assessed, which allowed evidencing of the physicochemical properties crucial for the anti-mycotoxin activity. Altogether, our results provide the first evidence of the promising potential of computational approaches to discover new anti-mycotoxin solutions.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
By using an ensemble-docking strategy, we undertook a large-scale virtual screening campaign in order to identify new putative hits against the MET kinase target. Following a large molecular dynamics ...sampling of its conformational space, a set of 45 conformers of the kinase was retained as docking targets to take into account the flexibility of the binding site moieties. Our screening funnel started from about 80,000 chemical compounds to be tested in silico for their potential affinities towards the kinase binding site. The top 100 molecules selected-thanks to the molecular docking results-were further analyzed for their interactions, and 25 of the most promising ligands were tested for their ability to inhibit MET activity in cells. F0514-4011 compound was the most efficient and impaired this scattering response to HGF (Hepatocyte Growth Factor) with an IC 50 of 7.2 μ M. Interestingly, careful docking analysis of this molecule with MET suggests a possible conformation halfway between classical type-I and type-II MET inhibitors, with an additional region of interaction. This compound could therefore be an innovative seed to be repositioned from its initial antiviral purpose towards the field of MET inhibitors. Altogether, these results validate our ensemble docking strategy as a cost-effective functional method for drug development.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Abstract
Background
Adverse drug reactions (ADRs) are statistically characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown ...in most cases. This is true even for hepatic or skin toxicities, which are classically monitored during drug design. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs, such as their properties, interactions, or involvements in pathways. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established.
Methods
We propose in this paper to mine knowledge graphs for identifying biomolecular features that may enable automatically reproducing expert classifications that distinguish drugs causative or not for a given type of ADR. In an Explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, (1) we mine a knowledge graph for features; (2) we train classifiers at distinguishing, on the basis of extracted features, drugs associated or not with two commonly monitored ADRs: drug-induced liver injuries (DILI) and severe cutaneous adverse reactions (SCAR); (3) we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and (4) we manually evaluate in a mini-study how they may be explanatory.
Results
Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR (Accuracy =
0
.74
and
0
.81
, respectively). Experts fully agreed that
7
3
% and
3
8
% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for
9
0
% and
7
7
% of them.
Conclusion
Knowledge graphs provide sufficiently diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Root-knot nematodes (RKN), from the
genus, have a worldwide distribution and cause severe economic damage to many life-sustaining crops. Because of their lack of specificity and danger to the ...environment, most chemical nematicides have been banned from use. Thus, there is a great need for new and safe compounds to control RKN. Such research involves identifying beforehand the nematode proteins essential to the invasion. Since G protein-coupled receptors GPCRs are the target of a large number of drugs, we have focused our research on the identification of putative nematode GPCRs such as those capable of controlling the movement of the parasite towards (or within) its host. A datamining procedure applied to the genome of
allowed us to identify a GPCR, belonging to the neuropeptide GPCR family that can serve as a target to carry out a virtual screening campaign. We reconstructed a 3D model of this receptor by homology modeling and validated it through extensive molecular dynamics simulations. This model was used for large scale molecular dockings which produced a filtered limited set of putative antagonists for this GPCR. Preliminary experiments using these selected molecules allowed the identification of an active compound, namely C260-2124, from the ChemDiv provider, which can serve as a starting point for further investigations.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Background
Hypertension, obesity and diabetes are major and potentially modifiable “risk factors” for cardiovascular diseases. Identification of biomarkers specific to these risk factors may help ...understanding the underlying pathophysiological pathways, and developing individual treatment.
Methods
The FIBRO-TARGETS (targeting cardiac fibrosis for heart failure treatment) consortium has merged data from 12 patient cohorts in 1 common database of > 12,000 patients. Three mutually exclusive main phenotypic groups were identified (“cases”): (1) “hypertensive”; (2) “obese”; and (3) “diabetic”; age–sex matched in a 1:2 proportion with “healthy controls” without any of these phenotypes. Proteomic associations were studied using a biostatistical method based on LASSO and confronted with machine-learning and complex network approaches.
Results
The case:control distribution by each cardiovascular phenotype was hypertension (50:100), obesity (50:98), and diabetes (36:72). Of the 86 studied proteins, 4 were found to be independently associated with hypertension: GDF-15, LEP, SORT-1 and FABP-2; 3 with obesity: CEACAM-8, LEP and PRELP; and 4 with diabetes: GDF-15, REN, CXCL-1 and SCF. GDF-15 (hypertension + diabetes) and LEP (hypertension + obesity) are shared by 2 different phenotypes. A machine-learning approach confirmed GDF-15, LEP and SORT-1 as discriminant biomarkers for the hypertension group, and LEP plus PRELP for the obesity group. Complex network analyses provided insight on the mechanisms underlying these disease phenotypes where fibrosis may play a central role.
Conclusion
Patients with “mutually exclusive” phenotypes display distinct bioprofiles that might underpin different biological pathways, potentially leading to fibrosis.
Graphic abstract
Plasma protein biomarkers and their association with mutually exclusive cardiovascular phenotypes: the FIBRO-TARGETS case–control analyses. Patients with “mutually exclusive” phenotypes (blue: obesity, hypertension and diabetes) display distinct protein bioprofiles (green: decreased expression; red: increased expression) that might underpin different biological pathways (orange arrow), potentially leading to fibrosis.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
The pathogenicity of phytonematodes relies on secreted virulence factors to rewire host cellular pathways for the benefits of the nematode. In the root-knot nematode (RKN)
, thousands of predicted ...secreted proteins have been identified and are expected to interact with host proteins at different developmental stages of the parasite. Identifying the host targets will provide compelling evidence about the biological significance and molecular function of the predicted proteins. Here, we have focused on the hub protein CSN5, the fifth subunit of the pleiotropic and eukaryotic conserved COP9 signalosome (CSN), which is a regulatory component of the ubiquitin/proteasome system. We used affinity purification-mass spectrometry (AP-MS) to generate the interaction network of CSN5 in
-infected roots. We identified the complete CSN complex and other known CSN5 interaction partners in addition to unknown plant and
proteins. Among these, we described
PASSE-MURAILLE (MiPM), a small pioneer protein predicted to contain a secretory peptide that is up-regulated mostly in the J2 parasitic stage. We confirmed the CSN5-MiPM interaction, which occurs in the nucleus, by bimolecular fluorescence complementation (BiFC). Using MiPM as bait, a GST pull-down assay coupled with MS revealed some common protein partners between CSN5 and MiPM. We further showed by
and microscopic analyses that the recombinant purified MiPM protein enters the cells of Arabidopsis root tips in a non-infectious context. In further detail, the supercharged N-terminal tail of MiPM (NTT-MiPM) triggers an unknown host endocytosis pathway to penetrate the cell. The functional meaning of the CSN5-MiPM interaction in the
parasitism is discussed. Moreover, we propose that the cell-penetrating properties of some
secreted proteins might be a non-negligible mechanism for cell uptake, especially during the steps preceding the sedentary parasitic phase.
Introduction
The SERVE-HF trial included patients with heart failure and reduced ejection fraction (HFrEF) with sleep-disordered breathing, randomly assigned to treatment with Adaptive-Servo ...Ventilation
(
ASV) or control. The primary outcome was the first event of death from any cause, lifesaving cardiovascular intervention, or unplanned hospitalization for worsening heart failure. A subgroup analysis of the SERVE-HF trial suggested that patients with Cheyne-Stokes respiration (CSR) < 20% (low CSR) experienced a beneficial effect from ASV, whereas in patients with CSR ≥ 20% ASV might have been harmful. Identifying the proteomic signatures and the underlying mechanistic pathways expressed in patients with CSR could help generating hypothesis for future research.
Methods
Using a large set of circulating protein-biomarkers (
n
= 276, available in 749 patients; 57% of the SERVE-HF population) we sought to investigate the proteins associated with CSR and to study the underlying mechanisms that these circulating proteins might represent.
Results
The mean age was 69 ± 10 years and > 90% were male. Patients with CSR < 20% (
n
= 139) had less apnoea-hypopnea index (AHI) events per hour and less oxygen desaturation. Patients with CSR < 20% might have experienced a beneficial effect of ASV treatment (primary outcome HR 95% CI = 0.55 0.34–0.88;
p
= 0.012), whereas those with CSR ≥ 20% might have experienced a detrimental effect of ASV treatment (primary outcome HR 95% CI = 1.39 1.09–1.76;
p
= 0.008);
p
for interaction = 0.001. Of the 276 studied biomarkers, 8 were associated with CSR (after adjustment and with a FDR1%-corrected
p
value). For example, higher PAR-1 and ITGB2 levels were associated with higher odds of having CSR < 20%, whereas higher LOX-1 levels were associated with higher odds of CSR ≥ 20%. Signalling, metabolic, haemostatic and immunologic pathways underlie the expression of these biomarkers.
Conclusion
We identified proteomic signatures that may represent underlying mechanistic pathways associated with patterns of CSR in HFrEF. These hypothesis-generating findings require further investigation towards better understanding of CSR in HFrEF.
Graphic abstract
Summary of the findings.
PAR-1
proteinase-activated receptor 1,
ADM
adrenomedullin,
HSP-27
heat shock protein-27,
ITGB2
integrin beta 2,
GLO1
glyoxalase 1,
ENRAGE/S100A12
S100 calcium-binding protein A12,
LOX-1
lectin-like LDL receptor 1,
ADAM-TS13
disintegrin and metalloproteinase with a thrombospondin type 1 motif, member13 also known as von Willebrand factor-cleaving protease.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Fetal and neonatal exposure to long-chain alkylphenols has been suspected to promote breast developmental disorders and consequently to increase breast cancer risk. However, disease predisposition ...from developmental exposures remains unclear. In this work, human MCF-10A mammary epithelial cells were exposed
to a low dose of a realistic (4-nonylphenol + 4-tert-octylphenol) mixture. Transcriptome and cell-phenotype analyses combined to functional and signaling network modeling indicated that long-chain alkylphenols triggered enhanced proliferation, migration ability, and apoptosis resistance and shed light on the underlying molecular mechanisms which involved the human estrogen receptor alpha 36 (ERα36) variant. A male mouse-inherited transgenerational model of exposure to three environmentally relevant doses of the alkylphenol mix was set up in order to determine whether and how it would impact on mammary gland architecture. Mammary glands from F3 progeny obtained after intrabuccal chronic exposure of C57BL/6J P0 pregnant mice followed by F1-F3 male inheritance displayed an altered histology which correlated with the phenotypes observed
in human mammary epithelial cells. Since cellular phenotypes are similar
and
and involve the unique ERα36 human variant, such consequences of alkylphenol exposure could be extrapolated from mouse model to human. However, transient alkylphenol treatments combined to ERα36 overexpression in mammary epithelial cells were not sufficient to trigger tumorigenesis in xenografted Nude mice. Therefore, it remains to be determined if low-dose alkylphenol transgenerational exposure and subsequent abnormal mammary gland development could account for an increased breast cancer susceptibility.