One of the critical issues in data mining process especially for organizations is evaluating the success level of performed data mining projects. The purpose of this research is designing a Fuzzy ...expert system for the evaluation of success level of data mining projects based on quality of CRISP-DM methodology phases as one of the famous data mining methodologies. Here the CRISP-DM phases are specified as inputs of Fuzzy Inference System (FIS) model and the output is the success level of data mining project. This system has been designed by MATLAB software and has been implemented for a data mining project in an Iranian Bank as empirical study.
A "smart component" data model in PLM Yunpeng Li; Roy, Utpal; Seung-Jun Shin ...
2015 IEEE International Conference on Big Data (Big Data),
10/2015
Conference Proceeding
Physical products are becoming smarter because of their increased number of embedded sensors and their real-time information-processing capabilities. Data analytics, particularly predictive ...analytics, is one of the most important of these capabilities because it uses statistical or machine-learning techniques to determine causal relations between input and output parameters. Many researchers have addressed the challenges in creating and evaluating predictive models. Few, however, have discussed how to employ such models effectively throughout a product's life cycle. In this paper, we address this issue by extending Product Lifecycle Management (PLM) systems to include "Smart Component" data models that incorporate predictive models as "parts" or "services" of products in their master records in PLM. These smart-component data models can be modularized, composed, reused, traced, maintained, and replaced on demand. We describe a prototype system to demonstrate the feasibility of the proposed data models using an open-source PLM platform.
Poker has been gradually gaining the attention of the scientific community, mostly in researchers on Artificial Intelligence. The main reason is concerned with the fact that Poker provides great ...challenges to the research in the area. Unlike many other games, poker is a stochastic game of imperfect information, which creates a high amount of possibilities to every state of the game. In this work a different line of thought is followed by trying to create an agent capable of reproducing the way a professional Poker human player plays for all stages in a Texas Hold'em Poker game. For this purpose, a high level data model able to comprehend the maximum of information relevant to every state of the game was built, loaded with data from a database containing millions of plays made by a professional poker players, by using Talend Data Integration. To execute Data mining techniques Weka software package was used. The final results show that it is possible to create a virtual poker player that make very similar decisions of a professional poker player.
Data mining (DM) has a wide range of applications in the health care field. DM can be used to discover hidden patterns among different diagnoses or to predict the disease of patients based on certain ...number of symptoms. It can be used also to analyze the success major of a given treatment for a group of patients based on a number of characteristics and parameters available. This paper demonstrates the ability of DM to develop a prediction model for a presumptive diagnosis of two familiar urinary diseases: the acute inflammation of the urinary bladder and nephritis of renal pelvis. The dataset used in this work includes a number of characteristics, which are important in diagnosing any patient with an acute inflammation of urinary bladder or nephritis. This research evaluates the supervised machine learning algorithms Ridor, OneR, and J48 in terms of performance and accuracy to determine the best classification algorithm which will be used to develop the accurate prediction model. The decision tree (J48) shows a powerful accuracy and capability in prediction, and has been used to classify the patients' data with the proper acute inflammation diseases. The analyzed dataset has been trained using the 10-fold cross validation. The decision tree for the acute urinary bladder and nephritis has been generated.
The massive use of Information and Communication Technology in education allowed to collect and store a huge amount of various data about all educational aspects. The analysis of these raw data could ...lead to new, unexpected but valuable knowledge, useful for both teachers and students, and also for faculties and universities managers. In this paper a knowledge discovery in databases process, applied on data collected mainly from a Learning Management System implemented in “Stefan cel Mare” University of Suceava is presented.
This article describes the design, comparison and evaluation of predictive models from the area of monitoring data on the basis of CRISP-DM and RadpidMiner technology, for the purpose of improving IT ...processes in the company. These models have been created on a sample of real data from the monitoring of IT systems in one of the largest companies in Slovakia. Article defines the detailed description and evaluation of created models, which represents only one phase of this research. This means that the other phases are only mentioned - understanding and editing of data, visualization and statistical analysis, along with modeling process itself. Predictive models generated by linear regression and ARIMA models are described in detail in this article. These models achieved during research are a huge benefit for companies, since they can predict the future value of individual transactions, and thus take the necessary measures to make the right decisions in order to improve quality of services.
If the full CRISP DM life-cycle is to be implemented then there needs to be a means by which business logic, understanding and aims can be directly related to the DM and KDD modelling process, and ...then onto deployment. Several graphical ways of representing data and models are considered: the E-R diagram, linked data and model ontologies, and graphical-model/Bayesian-net dependency diagrams. It is suggested that the provision of graphical tools for the domain expert to express their prior knowledge, understanding and aims is the best way of linking these to the DM & KDD process and subsequent deployment of discovered knowledge
Forecasting fog is an important issue for air traffic safety because adverse visibility conditions represent one of the major causes of traffic delay and of the economic loss associated with such ...phenomena. In such context the present work illustrates a Data Mining application for the fog forecast on a short time range (1 hour, 2 hours and 3 hours) on Paris Charles de Gaulle airport. Indeed three predictive models have been built using an historical dataset of 17 years of fog observations and other relevant meteorological parameters collected in the SYNOP message and by applying a BayesNet algorithm. The performances evaluation show that the best model for the fog forecast is that on one hour time range, presenting a percentage of correct classified instances of 97% and a true positive rate of 88%. The other implemented models show slightly worse performances with a percentage of correct classified instances of about 96% and 95% respectively and true positive rates of 80% and 71%. The work has been carried on according to the standard process (CRISP-DM) for Knowledge Discovery in Meteorological Database Process.
Data Mining Driven Fishbone(DMDF), which is whole a new term, is an enhancement of abstractive conception of multidimensional-data flow of fishbone applied for data processing to optimize the process ...and structure of data management and data mining. CDM-BSC(CRISP-DM applied with Balance Scorecard), which is developed from combination of traditional Data Processing Methodology and BSC for performance measurement systems. End-to-end DMDF diagram includes complex dataflow and different processing component and improvements for numerous aspects in multiply level. Balance Scorecard applied to CRISP-DM is a new methodology of improving the performance of Information and Data Processing. CDM-BSC-based DMDF provides integrated platform and mixed methodology to support the whole life cycle of data processing with comprehensive methodology. Data preprocessing, data Classification, Association rule mining and Prediction are the foundation and linkage of the whole data processing life cycle. DMDF supports combination of different mining component from strategy level, tactical level to abstractive level, and then re-engineered data mining process into execution system to realize reasonable architecture. CDM-BSC-based DMDF is a new direction of the structure of large scale information and data processing.