On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find ...relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm.
KDDML (KDD Markup Language) is a middleware language and system designed to support the development of final applications or higher level systems which deploy a mixture of data access, data ...preprocessing, extraction and deployment of data mining models.
We present our three-years’ experience in the development of KDDML. The design principles are motivated by requirements derived from recurring patterns in the KDD process.
The KDDML language is XML-based, both for query syntax and data/model representation. A KDDML query is an XML-document where XML tags correspond to operations on data/models, XML attributes correspond to parameters of those operations and XML sub-elements define arguments passed to the operators. We present the operators for data access and preprocessing, model extraction and deployment, and control flow ones.
The core of the KDDML system is a KDDML language interpreter with modularity and extensibility requirements as the main goals. Additional data sources, and preprocessing and mining algorithms can be easily plugged in the system.
In this paper, the knowledge discovery in databases and data mining (KDD/DM), one of the data-based decision support technologies, is applied to help in targeting customers for the insurance ...industry. In most KDD/DM application cases, major tasks are required, including data preparation, data preprocessing, data mining, interpretation, application and evaluation. A case study is presented that KDD/DM is utilized to explore decision rules for a leading insurance company. The decision rules can be used to investigate the potential customers for an existing or new insurance product. The research firstly constructed the application framework, then defined and conducted each task required, and finally obtained feedback from the case company. Discussions and implications with respect to this research are presented also.
In this work Data Mining tools are used to develop new and innovative models for the estimation of the rock deformation modulus and the Rock Mass Rating (RMR). A database published by Chun et al. ...(Int J Rock Mech Min Sci 46:649–658,
2008)
was used to develop these models. The parameters of the database were the depth, the weightings of the RMR system related to the uniaxial compressive strength, the rock quality designation, the joint spacing, the joint condition, the groundwater condition and the discontinuity orientation adjustment, the RMR and the deformation modulus. As a modelling tool the R program environment was used to apply these advanced techniques. Several algorithms were tested and analysed using different sets of input parameters. It was possible to develop new models to predict the rock deformation modulus and the RMR with improved accuracy and, additionally, allowed to have an insight of the importance of the different input parameters.
We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input ...(predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.
We describe the different stages in the data mining process and discuss some pitfalls and guidelines to circumvent them. Despite the predominant attention on analysis, data selection and ...pre-processing are the most time-consuming activities, and have a substantial influence on ultimate success. Successful data mining projects require the involvement of expertise in data mining, company data, and the subject area concerned. Despite the attractive suggestion of ‘fully automatic’ data analysis, knowledge of the processes behind the data remains indispensable in avoiding the many pitfalls of data mining.
Big Data Schmarzo, Bill
2013, 2013-10-07T00:00:00, 2013-09-23, 2013-09-20, c2013
eBook
Leverage big data to add value to your business Social media analytics, web-tracking, and other technologies help companies acquire and handle massive amounts of data to better understand their ...customers, products, competition, and markets. Armed with the insights from big data, companies can improve customer experience and products, add value, and increase return on investment. The tricky part for busy IT professionals and executives is how to get this done, and that's where this practical book comes in. Big Data: Understanding How Data Powers Big Business is a complete how-to guide to leveraging big data to drive business value. Full of practical techniques, real-world examples, and hands-on exercises, this book explores the technologies involved, as well as how to find areas of the organization that can take full advantage of big data. * Shows how to decompose current business strategies in order to link big data initiatives to the organization's value creation processes * Explores different value creation processes and models * Explains issues surrounding operationalizing big data, including organizational structures, education challenges, and new big data-related roles * Provides methodology worksheets and exercises so readers can apply techniques * Includes real-world examples from a variety of organizations leveraging big data Big Data: Understanding How Data Powers Big Business is written by one of Big Data's preeminent experts, William Schmarzo. Don't miss his invaluable insights and advice.
The collaborative emergency call-taking information system in the Czech Republic forms a network of cooperating emergency call centres processing emergency calls to the European 112 emergency number. ...Large amounts of various incident records are stored in its databases. The data can be used for mining spatial and temporal anomalies, as well as for the monitoring and analysis of the performance of the emergency call-taking system. In this paper we describe a method for knowledge discovery and visualisation targeted at the performance analysis of the system with respect to the organisation of the emergency call-taking information system and its data characteristics. The method is based on the Kohonen Self-Organising Map (SOM) algorithm and its extension, the Growing Grid algorithm.
The clausal discovery engine claudien is presented. CLAUDIEN is an inductive logic programming engine that fits in the descriptive data mining paradigm. CLAUDIEN addresses characteristic induction ...from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities are represented by clausal theories, and the data using Herbrand interpretations. Because CLAUDIEN uses clausal logic to represent hypotheses, the regularities induced typically involve multiple relations or predicates. CLAUDIEN also employs a novel declarative bias mechanism to define the set of clauses that may appear in a hypothesis.PUBLICATION ABSTRACT