•Innovative use of autoencoders to reconstruct missing values in event logs.•Focus on anomalous and missing information at the level of event log attributes.•Methods tested on real life and ...artificial event logs.•Qualitative evaluation of impact on process discovery is also presented.
Low quality of business process event logs, as determined by anomalous and missing values, is often unavoidable in practical contexts. The output of process analysis that uses event logs with missing and anomalous values is also likely to be of low quality, thus decreasing the quality of any decisions based on it. While previous work has focused on reconstructing missing events in an event log or removing anomalous traces, in this paper we focus on detecting anomalous values and reconstructing missing values at the level of attributes in event logs. We propose methods based on autoencoders, which are a class of neural networks that can reconstruct their own input and are particularly suitable to learn a model of the complex relationships among attribute values in an event log. These methods do not rely on any a-priori knowledge about the business process that generated an event log and are evaluated using real world and artificially-generated event logs. The paper also discusses a qualitative analysis of the impact of event log cleaning and reconstruction on the output of process discovery. The proposed approach shows remarkable performance regarding activity labels and timestamps in artificial event logs. The performance in the case of real world event logs, in particular timestamp anomaly detection, is lower, which may be due to high variability of attribute values in the chosen event logs. Process models discovered from reconstructed event logs are characterised by lower variability of allowed behaviour and, therefore, are more usable in practice.
Process-oriented data mining (process mining) uses algorithms and data (in the form of event logs) to construct models that aim to provide insights into organisational processes. The quality of the ...data (both form and content) presented to the modeling algorithms is critical to the success of the process mining exercise. Cleaning event logs to address quality issues prior to conducting a process mining analysis is a necessary, but generally tedious and ad hoc task. In this paper we describe a set of data quality issues, distilled from our experiences in conducting process mining analyses, commonly found in process mining event logs or encountered while preparing event logs from raw data sources. We show that patterns are used in a variety of domains as a means for describing commonly encountered problems and solutions. The main contributions of this article are in showing that a patterns-based approach is applicable to documenting commonly encountered event log quality issues, the formulation of a set of components for describing event log quality issues as patterns, and the description of a collection of 11 event log imperfection patterns distilled from our experiences in preparing event logs. We postulate that a systematic approach to using such a pattern repository to identify and repair event log quality issues benefits both the process of preparing an event log and the quality of the resulting event log. The relevance of the pattern-based approach is illustrated via application of the patterns in a case study and through an evaluation by researchers and practitioners in the field.
•A set of data imperfection patterns for event logs is introduced.•The data imperfection patterns enable systematic approach to data cleaning.•The usefulness of the patterns is demonstrated in a process mining case study.•The patterns have been evaluated with process mining experts and practitioners.
Process mining can provide valuable insights in business processes using an event log containing process execution data. Despite the significant potential of process mining to support the analysis ...and improvement of processes, the reliability of process mining outcomes depends on the quality of the event log. Real-life logs typically suffer from various data quality issues. Consequently, thorough event log quality assessment is required before applying process mining algorithms. This paper introduces DaQAPO, the first R-package which supports flexible and fine-grained event log quality assessment. It provides a rich set of tests to identify a wide range of event log quality issues, while having sufficient flexibility to allow the detection of context-specific quality issues.
•Process mining draws insights in business processes from event logs.•Reliability of process mining outcomes depend on quality of event logs.•DaQAPO supports flexible and fine-grained event log quality assessment.•DaQAPO is developed as an R-package providing a rich set of assessment tests.•DaQAPO is integrated in bupaR, the open-source framework for process mining in R.
Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lion׳s share of process mining research has been focusing on process ...discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctor׳s experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands.
Process mining, as with any form of data analysis, relies heavily on the quality of input data to generate accurate and reliable results. A fit-for-purpose event log nearly always requires ...time-consuming, manual pre-processing to extract events from source data, with data quality dependent on the analyst's domain knowledge and skills. Despite much being written about data quality in general, a generalisable framework for analysing event data quality issues when extracting logs for process mining remains unrealised. Following the DSR paradigm, we present RDB2Log, a quality-aware, semi-automated approach for extracting event logs from relational data. We validated RDB2Log's design against design objectives extracted from literature and competing artifacts, evaluated its design and performance with process mining experts, implemented a prototype with a defined set of quality metrics, and applied it in laboratory settings and in a real-world case study. The evaluation shows that RDB2Log is understandable, of relevance in current research, and supports process mining in practice.
•Quality-informed event log generation from relational source data•Provides a measurement approach for fitness-for-use of relational source data for process mining•Develops the concept of event constructors as a mapping between source data and event log attributes•Uses Design Science Research methodology & evaluation frameworks to validate RDB2Log•Implemented software prototype evaluated as useful and applicable in real-world settings by both practitioner and research groups
Resources can organise their work in batches, i.e. perform activities on multiple cases simultaneously, concurrently or intentionally defer activity execution to handle multiple cases (quasi-) ...sequentially. As batching behaviour influences process performance, efforts to gain insight on this matter are valuable. In this respect, this paper uses event logs, data files containing process execution information, as an information source. More specifically, this work (i) identifies and formalises three batch processing types, (ii) presents a resource-activity centered approach to identify batching behaviour in an event log and (iii) introduces batch processing metrics to acquire knowledge on batch characteristics and its influence on process execution. These contributions are integrated in the Batch Organisation of Work Identification algorithm (BOWI), which is evaluated on both artificial and real-life data.
•First paper to systematically analyse batching behaviour in an event log•Three types of batch processing are distinguished and formalised•New algorithm to detect all batch processing types in an event log•Specification and calculation of relevant batch processing metrics•Extensive evaluation on artificial and real-life event logs
The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process ...discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.
Enhancing the website usage using process mining Choudhary, Chetna; Mehrotra, Deepti; Shrivastava, Avinash K.
The International journal of quality & reliability management,
06/2023
Journal Article
Recenzirano
Purpose
As the number of web applications is increasing day by day web mining acts as an important tool to extract useful information from weblogs and analyse them according to the attributes and ...predict the usage of a website. The main aim of this paper is to inspect how process mining can be used to predict the web usability of hotel booking sites based on the number of users on each page, and the time of stay of each user. Through this paper, the authors analyse the web usability of a website through process mining by finding the web usability metrics. This work proposes an approach to finding the usage of a website using the attributes available in the weblog which predicts the actual footfall on a website.
Design/methodology/approach
PROM (Process Mining tool) is used for the analysis of the event log of a hotel booking site. In this work, authors have used a case study to apply the PROM (process mining tool) to pre-process the event log dataset for analysis to discover better-structured process maps than without pre-processing.
Findings
This article first provided an overview of process mining, then focused on web mining and later discussed process mining techniques. It also described different target languages: system nets (i.e. Petri nets with an initial and a final state), inductive miner and heuristic miner, graphs showing the change in behaviour of the dataset and predicting the outcome, that is the webpage having the maximum number of hits.
Originality/value
In this work, a case study has been used to apply the PROM (process mining tool) to pre-process the event log dataset for analysis to discover better-structured process maps than without pre-processing.
This study aims to enhance the analysis of healthcare processes by introducing Object-Centric Process Mining (OCPM). By offering a holistic perspective that accounts for the interactions among ...various objects, OCPM transcends the constraints of conventional patient-centric process mining approaches, ensuring a more detailed and inclusive understanding of healthcare dynamics.
We develop a novel method to transform the Observational Medical Outcomes Partnership Common Data Models (OMOP CDM) into Object-Centric Event Logs (OCELs). First, an OMOP CDM4PM is created from the standard OMOP CDM, focusing on data relevant to generating OCEL and addressing healthcare data’s heterogeneity and standardization challenges. Second, this subset is transformed into OCEL based on specified healthcare criteria, including identifying various object types, clinical activities, and their relationships. The methodology is tested on the MIMIC-IV database to evaluate its effectiveness and utility.
Our proposed method effectively produces OCELs when applied to the MIMIC-IV dataset, allowing for the implementation of OCPM in the healthcare industry. We rigorously evaluate the comprehensiveness and level of abstraction to validate our approach’s effectiveness. Additionally, we create diverse object-centric process models intricately designed to navigate the complexities inherent in healthcare processes.
Our approach introduces a novel perspective by integrating multiple viewpoints simultaneously. To the best of our knowledge, this is the inaugural application of OCPM within the healthcare sector, marking a significant advancement in the field.
Display omitted
A byproduct of the transition to electronic health records (EHRs) is the associated observational data that capture EHR users’ granular interactions with the medical record. Often referred to as ...audit log data or event log data, these datasets capture and timestamp user activity while they are logged in to the EHR. These data – alone and in combination with other datasets – offer a new source of insights, which cannot be gleaned from claims data or clinical data, to support health services research and those studying healthcare processes and outcomes. In this commentary, we seek to promote broader awareness of EHR audit log data and to stimulate their use in many contexts. We do so by describing EHR audit log data and offering a framework for their potential uses in quality domains (as defined by the National Academy of Medicine). The framework is illustrated with select examples in the safety and efficiency domains, along with their accompanying methodologies, which serve as a proof of concept. This article also discusses insights and challenges from working with EHR audit log data. Ensuring that researchers are aware of such data, and the new opportunities they offer, is one way to assure that our healthcare system benefits from the digital revolution.