Subgroup discovery (SD) aims at finding significant subgroups of a given population of individuals characterized by statistically unusual properties of interest. SD on event logs provides insight ...into particular behaviors of processes, which may be a valuable complement to the traditional process analysis techniques, especially for low-structured processes. This paper proposes a scalable and efficient method to search significant SD rules on frequent sequences of events, exploiting their multidimensional nature. With this method, it is intended to identify significant subsequences of events where the distribution of values of some target aspect is significantly different than the same distribution for the entire event log. A publicly available real-life event log of a Dutch hospital is used as a running example to demonstrate the applicability of our method. The proposed approach was applied on a real-life case study based on the public transport of a medium size European city (Porto, Portugal), for which the event data consists of 133 million smartcard travel validations from buses, trams and trains. The results include a characterization of mobility flows over multiple aspects, as well as the identification of unexpected behaviors in the flow of commuters (public transport). The generated knowledge provided a useful insight into the behavior of travelers, which can be applied at operational, tactical and strategic business levels, enhancing the current view of the transport services to transport authorities and operators.
•Significant subgroup discovery rules on sequences of multidimensional events.•Process discovery and conformance checking on low-structured processes.•Real-life case study based on smartcard travel validations of a transport network.•Characterization of mobility flows over multiple aspects.•Identification of unexpected behaviors in the flow of commuters.
Communities can intuitively be defined as subsets of nodes of a graph with a dense structure in the corresponding subgraph. However, for mining such communities usually only structural aspects are ...taken into account. Typically, no concise nor easily interpretable community description is provided.
For tackling this issue, this paper focuses on description-oriented community detection using subgroup discovery. In order to provide both structurally valid and interpretable communities we utilize the graph structure as well as additional descriptive features of the graph’s nodes. A descriptive community pattern built upon these features then describes and identifies a community, i.e., a set of nodes, and vice versa. Essentially, we mine patterns in the “description space” characterizing interesting sets of nodes (i.e., subgroups) in the “graph space”; the interestingness of a community is evaluated by a selectable quality measure.
We aim at identifying communities according to standard community quality measures, while providing characteristic descriptions of these communities at the same time. For this task, we propose several optimistic estimates of standard community quality functions to be used for efficient pruning of the search space in an exhaustive branch-and-bound algorithm. We demonstrate our approach in an evaluation using five real-world data sets, obtained from three different social media applications.
There exist a high demand to provide explainability to artificial intelligence systems, where decision making models are included. This paper focuses on crowd decision making using natural language ...evaluations from social media with the aim to provide explainability. We present the Explainable Crowd Decision Making based on Subgroup Discovery and Attention Mechanisms (ECDM-SDAM) methodology as an a posteriori explainable process that captures the wisdom of crowds that is naturally provided in social media opinions. It extracts the opinions from social media texts using a deep learning based sentiment analysis approach called Attention based Sentiment Analysis Method. The methodology includes a backward process that provides explanations to justify its sense-making procedure by applying mainly the attention mechanism on texts and subgroup discovery on opinions. We evaluate the methodology in the real case study of the TripR-2020Large dataset for restaurant choice. The results show that the ECDM-SDAM methodology provides easy understandable explanations that elucidates the key reasons that support the output of the decision process.
•Explainability in decision making is essential to increase its use and understanding.•Attention mechanisms and subgroup discovery can generate explainable decision making.•We propose a methodology that offers explanations of its internal decision mechanism.•The proposed methodology captures the wisdom of crowds from social media.•Natural language with sentiment analysis and deep learning enriches expert evaluation.
Summary
Building on Yu and Kumbier's predictability, computability and stability (PCS) framework and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable ...Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re‐analysis of the 1999–2000 VIGOR study, an 8076‐patient randomised controlled trial that compared the risk of adverse events from a then newly approved drug, rofecoxib (Vioxx), with that from an older drug naproxen. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well‐calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.
► We present a complete analysis of web usage mining in the website OrOliveSur.com. ► Clustering, association rule and subgroup discovery techniques have been applied. ► Results show to the webmaster ...team interesting conclusions to improve the design.
Web usage mining is the process of extracting useful information from users history databases associated to an e-commerce website. The extraction is usually performed by data mining techniques applied on server log data or data obtained from specific tools such as Google Analytics. This paper presents the methodology used in an e-commerce website of extra virgin olive oil sale called www.OrOliveSur.com. We will describe the set of phases carried out including data collection, data preprocessing, extraction and analysis of knowledge. The knowledge is extracted using unsupervised and supervised data mining algorithms through descriptive tasks such as clustering, association and subgroup discovery; applying classical and recent approaches. The results obtained will be discussed especially for the interests of the designer team of the website, providing some guidelines for improving its usability and user satisfaction.
Opinion summarisation is concerned with generating structured summaries of multiple opinions in order to provide insightful knowledge to end users. We present the Aspect Discovery for OPinion ...Summarisation (ADOPS) methodology, which is aimed at generating explainable and structured opinion summaries. ADOPS is built upon aspect-based sentiment analysis methods based on deep learning and Subgroup Discovery techniques. The resultant opinion summaries are presented as interesting rules, which summarise in explainable terms for humans the state of the opinion about the aspects of a specific entity. We annotate and release a new dataset of opinions about a single entity on the restaurant review domain for assessing the ADOPS methodology, and we call it ORCo. The results show that ADOPS is able to generate interesting rules with high values of support and confidence, which provide explainable and insightful knowledge about the state of the opinion of a certain entity.
•We present a novel methodology for aspect-based opinion summarisation.•Our methodology combines deep learning and subgroup discovery methods.•We categorise the aspects of restaurant reviews and classify their opinion values.•The summaries are presented in explainable terms for humans as interesting rules.•We release a new dataset for assessing opinion summarisation models.