The design of heterogeneous catalysts is challenged by the complexity of materials and processes that govern reactivity and by the fact that the number of good catalysts is very small in comparison ...to the number of possible materials. Here, we show how the subgroup-discovery (SGD) artificial-intelligence approach can be applied to an experimental plus theoretical data set to identify constraints on key physicochemical parameters, the so-called SG rules, which exclusively describe materials and reaction conditions with outstanding catalytic performance. By using high-throughput experimentation, 120 SiO2-supported catalysts containing ruthenium, tungsten, and phosphorus were synthesized and tested in the catalytic oxidation of propylene. As candidate descriptive parameters, the temperature and 10 parameters related to the composition and chemical nature of the catalyst materials, derived from calculated free-atom properties, were offered. The temperature, the phosphorus content, and the composition-weighted electronegativity are identified as key parameters describing high yields toward the value-added oxygenate products acrolein and acrylic acid. The SG rules not only reflect the underlying processes particularly associated with high performance but also guide the design of more complex catalysts containing up to five elements in their composition.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
The electrochemical nitrogen reduction reaction (NRR) is a much sought-after low-energy alternative to Haber–Bosch ammonia synthesis. Single-atom catalysts (SACs) promise to break scaling relations ...between adsorption energies of key NRR reaction intermediates that severely limit the performance of extended catalysts. Here, we perform a computational screening study of transition metal (TM) SACs supported on vanadium disulfide (VS2) and indeed obtain strongly broken scaling relations. A data-driven analysis by means of outlier detection and subgroup discovery reveals that this breaking is restricted to early TMs, while detailed electronic structure analysis rationalizes it in terms of strong charge transfer to the underlying support. This charge transfer selectively weakens *N and *NH adsorption and leads to promising NRR descriptors for SACs formed of earlier TMs like Ta that would conventionally not be associated with nitrogen reduction.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Subgroup discovery (SD) aims at finding significant subgroups of a given population of individuals characterized by statistically unusual properties of interest. SD on event logs provides insight ...into particular behaviors of processes, which may be a valuable complement to the traditional process analysis techniques, especially for low-structured processes. This paper proposes a scalable and efficient method to search significant SD rules on frequent sequences of events, exploiting their multidimensional nature. With this method, it is intended to identify significant subsequences of events where the distribution of values of some target aspect is significantly different than the same distribution for the entire event log. A publicly available real-life event log of a Dutch hospital is used as a running example to demonstrate the applicability of our method. The proposed approach was applied on a real-life case study based on the public transport of a medium size European city (Porto, Portugal), for which the event data consists of 133 million smartcard travel validations from buses, trams and trains. The results include a characterization of mobility flows over multiple aspects, as well as the identification of unexpected behaviors in the flow of commuters (public transport). The generated knowledge provided a useful insight into the behavior of travelers, which can be applied at operational, tactical and strategic business levels, enhancing the current view of the transport services to transport authorities and operators.
•Significant subgroup discovery rules on sequences of multidimensional events.•Process discovery and conformance checking on low-structured processes.•Real-life case study based on smartcard travel validations of a transport network.•Characterization of mobility flows over multiple aspects.•Identification of unexpected behaviors in the flow of commuters.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
There exist a high demand to provide explainability to artificial intelligence systems, where decision making models are included. This paper focuses on crowd decision making using natural language ...evaluations from social media with the aim to provide explainability. We present the Explainable Crowd Decision Making based on Subgroup Discovery and Attention Mechanisms (ECDM-SDAM) methodology as an a posteriori explainable process that captures the wisdom of crowds that is naturally provided in social media opinions. It extracts the opinions from social media texts using a deep learning based sentiment analysis approach called Attention based Sentiment Analysis Method. The methodology includes a backward process that provides explanations to justify its sense-making procedure by applying mainly the attention mechanism on texts and subgroup discovery on opinions. We evaluate the methodology in the real case study of the TripR-2020Large dataset for restaurant choice. The results show that the ECDM-SDAM methodology provides easy understandable explanations that elucidates the key reasons that support the output of the decision process.
•Explainability in decision making is essential to increase its use and understanding.•Attention mechanisms and subgroup discovery can generate explainable decision making.•We propose a methodology that offers explanations of its internal decision mechanism.•The proposed methodology captures the wisdom of crowds from social media.•Natural language with sentiment analysis and deep learning enriches expert evaluation.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Communities can intuitively be defined as subsets of nodes of a graph with a dense structure in the corresponding subgraph. However, for mining such communities usually only structural aspects are ...taken into account. Typically, no concise nor easily interpretable community description is provided.
For tackling this issue, this paper focuses on description-oriented community detection using subgroup discovery. In order to provide both structurally valid and interpretable communities we utilize the graph structure as well as additional descriptive features of the graph’s nodes. A descriptive community pattern built upon these features then describes and identifies a community, i.e., a set of nodes, and vice versa. Essentially, we mine patterns in the “description space” characterizing interesting sets of nodes (i.e., subgroups) in the “graph space”; the interestingness of a community is evaluated by a selectable quality measure.
We aim at identifying communities according to standard community quality measures, while providing characteristic descriptions of these communities at the same time. For this task, we propose several optimistic estimates of standard community quality functions to be used for efficient pruning of the search space in an exhaustive branch-and-bound algorithm. We demonstrate our approach in an evaluation using five real-world data sets, obtained from three different social media applications.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Summary
Building on Yu and Kumbier's predictability, computability and stability (PCS) framework and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable ...Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re‐analysis of the 1999–2000 VIGOR study, an 8076‐patient randomised controlled trial that compared the risk of adverse events from a then newly approved drug, rofecoxib (Vioxx), with that from an older drug naproxen. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well‐calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
On GNN explainability with activation rules Veyrin-Forrer, Luca; Kamal, Ataollah; Duffner, Stefan ...
Data mining and knowledge discovery,
09/2024, Volume:
38, Issue:
5
Journal Article
Peer reviewed
Open access
GNNs are powerful models based on node representation learning that perform particularly well in many machine learning problems related to graphs. The major obstacle to the deployment of GNNs is ...mostly a problem of societal acceptability and trustworthiness, properties which require making explicit the internal functioning of such models. Here, we propose to mine activation rules in the hidden layers to understand how the GNNs perceive the world. The problem is not to discover activation rules that are individually highly discriminating for an output of the model. Instead, the challenge is to provide a small set of rules that cover all input graphs. To this end, we introduce the subjective activation pattern domain. We define an effective and principled algorithm to enumerate activations rules in each hidden layer. The proposed approach for quantifying the interest of these rules is rooted in information theory and is able to account for background knowledge on the input graph data. The activation rules can then be redescribed thanks to pattern languages involving interpretable features. We show that the activation rules provide insights on the characteristics used by the GNN to classify the graphs. Especially, this allows to identify the hidden features built by the GNN through its different layers. Also, these rules can subsequently be used for explaining GNN decisions. Experiments on both synthetic and real-life datasets show highly competitive performance, with up to
200
%
improvement in fidelity on explaining graph classification over the SOTA methods.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
► We present a complete analysis of web usage mining in the website OrOliveSur.com. ► Clustering, association rule and subgroup discovery techniques have been applied. ► Results show to the webmaster ...team interesting conclusions to improve the design.
Web usage mining is the process of extracting useful information from users history databases associated to an e-commerce website. The extraction is usually performed by data mining techniques applied on server log data or data obtained from specific tools such as Google Analytics. This paper presents the methodology used in an e-commerce website of extra virgin olive oil sale called www.OrOliveSur.com. We will describe the set of phases carried out including data collection, data preprocessing, extraction and analysis of knowledge. The knowledge is extracted using unsupervised and supervised data mining algorithms through descriptive tasks such as clustering, association and subgroup discovery; applying classical and recent approaches. The results obtained will be discussed especially for the interests of the designer team of the website, providing some guidelines for improving its usability and user satisfaction.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK