This paper introduces a software tool named
KEEL
which is a software tool to assess evolutionary algorithms for Data Mining problems of various kinds including as regression, classification, ...unsupervised learning, etc. It includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL, as well as the integration of evolutionary learning techniques with different pre-processing techniques, allowing it to perform a complete analysis of any learning model in comparison to existing software tools. Moreover, KEEL has been designed with a double goal: research and educational.
About one half of adults with acute lymphoblastic leukemia are not cured of the disease and ultimately die. The objective of this study was to explore the factors influencing the outcome of adult ...patients with relapsed acute lymphoblastic leukemia.
We analyzed the characteristics, the outcome and the prognostic factors for survival after first relapse in a series of 263 adult patients with acute lymphoblastic leukemia (excluding those with mature B-cell acute lymphoblastic leukemia) prospectively enrolled in four consecutive risk-adapted PETHEMA trials.
The median overall survival after relapse was 4.5 months (95% CI, 4-5 months) with a 5-year overall survival of 10% (95% CI, 8%-12%); 45% of patients receiving intensive second-line treatment achieved a second complete remission and 22% (95% CI, 14%-30%) of them remained disease free at 5 years. Factors predicting a good outcome after rescue therapy were age less than 30 years (2-year overall survival of 21% versus 10% for those over 30 years old; P<0.022) and a first remission lasting more than 2 years (2-year overall survival of 36% versus 17% among those with a shorter first remission; P<0.001). Patients under 30 years old whose first complete remission lasted longer than 2 years had a 5-year overall survival of 38% (95% CI, 23%-53%) and a 5-year disease-free survival of 53% (95% CI, 34%-72%).
The prognosis of adult patients with acute lymphoblastic leukemia who relapse is poor. Those aged less than 30 years with a first complete remission lasting longer than 2 years have reasonable possibilities of becoming long-term survivors while patients over this age or those who relapse early cannot be successfully rescued using the therapies currently available.
Tuning fuzzy rule-based systems for linguistic fuzzy modeling is an interesting and widely developed task. It involves adjusting some of the components of the knowledge base without completely ...redefining it. This contribution introduces a genetic tuning process for jointly fitting the fuzzy rule symbolic representations and the meaning of the involved membership functions. To adjust the former component, we propose the use of linguistic hedges to perform slight modifications keeping a good interpretability. To alter the latter component, two different approaches changing their basic parameters and using nonlinear scaling factors are proposed. As the accomplished experimental study shows, the good performance of our proposal mainly lies in the consideration of this tuning approach performed at two different levels of significance. The paper also analyzes the interaction of the proposed tuning method with a fuzzy rule set reduction process. A good interpretability-accuracy tradeoff is obtained combining both processes with a sequential scheme: first reducing the rule set and subsequently tuning the model.
Emerging pattern mining is a data mining task that aims to discover discriminative patterns, which can describe emerging behavior with respect to a property of interest. In recent years, the ...description of datasets has become an interesting field due to the easy acquisition of knowledge by the experts. In this review, we will focus on the descriptive point of view of the task. We collect the existing approaches that have been proposed in the literature and group them together in a taxonomy in order to obtain a general vision of the task. A complete empirical study demonstrates the suitability of the approaches presented. This review also presents future trends and emerging prospects within pattern mining and the benefits of knowledge extracted from emerging patterns. WIREs Data Mining Knowl Discov 2018, 8:e1231. doi: 10.1002/widm.1231
This article is categorized under:
Fundamental Concepts of Data and Knowledge > Knowledge Representation
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
The main objective of this paper is to review the emerging pattern mining from descriptive induction. It provides a taxonomy of the differentes approaches proposed and they are categorised according to the objectives of descriptive knowledge.
Supervised descriptive rule discovery represents a set of data mining techniques whose objective is to describe data with respect to a property of interest. This concept encompasses different ...techniques such as subgroup discovery, emerging patterns and contrast sets. Supervised learning is used to obtain rules for descriptive purposes but with different quality measures. Although their origin is based on different data mining tasks, our hypothesis is about the existence of a compatibility between subgroup discovery, emerging patterns and contrast sets thanks to the common use of a weighted relative accuracy quality measure. A complete analysis shows this relationship between the different tasks. The analysis is supported by an empirical study with the most representative algorithms for each technique.
The paper shows how the use of the weighted relative accuracy allows the experts to distinguish between interesting subgroups, emerging and/or contrasting rules thanks to the relation between the quality measures employed in the search process for different models. In addition, this relationship enables us to analyse the main differences and/or similarities between the different techniques within supervised descriptive rule discovery. This scenario opens up new challenges for the supervised descriptive rule learning models in analysing and developing descriptive models with a new perspective.
Nowadays it is increasingly important in many applications to understand how different factors influence a variable of interest in a predictive modeling process. This task becomes particularly ...important in the context of Explainable Artificial Intelligence. Knowing the relative impact of each variable on the output allows us to acquire more information about the problem and about the output provided by a model.
This paper proposes a new methodology, XAIRE, that determines the relative importance of input variables in a prediction environment, considering multiple prediction models in order to increase generality and avoid bias inherent in a particular learning algorithm. Concretely, we present an ensemble-based methodology that promotes the aggregation of results from several prediction methods to obtain a relative importance ranking. Also, statistical tests are considered in the methodology in order to reveal significant differences between the relative importance of the predictor variables. As a case study, XAIRE is applied to the arrival of patients in a Hospital Emergency Department, which has resulted in one of the largest sets of different predictor variables in the literature. Results show the extracted knowledge related to the relative importance of the predictors involved in the case study.
•XAIRE: a new methodology to determine the relative importance of the predictor variables.•Relative importance of predictors of the arrivals at an Emergency Department.•Design and use of a large extensive set of exogenous or predictor variables.•XAI knowledge obtained from the arrivals at an ED prediction process.•Most relative important variables in ED: calendar and arrival (certain lags) variables.
► We present a complete analysis of web usage mining in the website OrOliveSur.com. ► Clustering, association rule and subgroup discovery techniques have been applied. ► Results show to the webmaster ...team interesting conclusions to improve the design.
Web usage mining is the process of extracting useful information from users history databases associated to an e-commerce website. The extraction is usually performed by data mining techniques applied on server log data or data obtained from specific tools such as Google Analytics. This paper presents the methodology used in an e-commerce website of extra virgin olive oil sale called www.OrOliveSur.com. We will describe the set of phases carried out including data collection, data preprocessing, extraction and analysis of knowledge. The knowledge is extracted using unsupervised and supervised data mining algorithms through descriptive tasks such as clustering, association and subgroup discovery; applying classical and recent approaches. The results obtained will be discussed especially for the interests of the designer team of the website, providing some guidelines for improving its usability and user satisfaction.
This paper proposes a novel algorithm for subgroup discovery task based on genetic programming and fuzzy logic called Fuzzy Genetic Programming-based for Subgroup Discovery (FuGePSD). The genetic ...programming allows to learn compact expressions with the main objective to obtain rules for describing simple, interesting and interpretable subgroups. This algorithm incorporates specific operators in the search process to promote the diversity between the individuals. The evolutionary scheme of FuGePSD is codified through the genetic cooperative-competitive approach promoting the competition and cooperation between the individuals of the population in order to find out the optimal solutions for the SD task.
FuGePSD displays its potential with high-quality results in a wide experimental study performed with respect to others evolutionary algorithms for subgroup discovery. Moreover, the quality of this proposal is applied to a case study related to acute sore throat problems.
Introduction
Some local protocols suggest using intermediate or therapeutic doses of anticoagulants for thromboprophylaxis in hospitalized patients with coronavirus disease 2019 (COVID‐19). However, ...the incidence of bleeding, predictors of major bleeding, or the association between bleeding and mortality remain largely unknown.
Methods
We performed a cohort study of patients hospitalized for COVID‐19 that received intermediate or therapeutic doses of anticoagulants from March 25 to July 22, 2020, to identify those at increased risk for major bleeding. We used bivariate and multivariable logistic regression to explore the risk factors associated with major bleeding.
Results
During the study period, 1965 patients were enrolled. Of them, 1347 (69%) received intermediate‐ and 618 (31%) therapeutic‐dose anticoagulation, with a median duration of 12 days in both groups. During the hospital stay, 112 patients (5.7%) developed major bleeding and 132 (6.7%) had non‐major bleeding. The 30‐day all‐cause mortality rate for major bleeding was 45% (95% confidence interval CI: 36%‐54%) and for non‐major bleeding 32% (95% CI: 24%‐40%). Multivariable analysis showed increased risk for in‐hospital major bleeding associated with D‐dimer levels >10 times the upper normal range (hazard ratio HR, 2.23; 95% CI, 1.38–3.59), ferritin levels >500 ng/ml (HR, 2.01; 95% CI, 1.02–3.95), critical illness (HR, 1.91; 95% CI, 1.14–3.18), and therapeutic‐intensity anticoagulation (HR, 1.43; 95% CI, 1.01–1.97).
Conclusions
Among patients hospitalized with COVID‐19 receiving intermediate‐ or therapeutic‐intensity anticoagulation, a major bleeding event occurred in 5.7%. Use of therapeutic‐intensity anticoagulation, critical illness, and elevated D‐dimer or ferritin levels at admission were associated with increased risk for major bleeding.
► We propose a new set of variables to characterize a Concentrating Photovoltaic module. ► We propose an evolutionary method, CO2RBFN, to design an RBFN for this problem. ► The model designed by ...CO2RBFN outperforms the results obtained by other methods. ► It is suitable for this problem due to its accurate behavior and according to CIEMAT criteria. ► It can be used to work out the maximum power and to analyze the performance of the CPV module.
Concentrating Photovoltaic (CPV) technology attempts to optimize the efficiency of solar energy production systems. As conventional Photovoltaic (PV) technology, suffers from variability in its production and needs models for determining the exact module performance. There are several problems when analyzing CPV systems performance with traditional techniques due to absence of standardization. In this sense it is remarkable the importance for the emerging CPV technology, of the existence of models which allow the prediction of modules performance from initial atmospheric conditions. In this paper, a CPV module is studied by means of atmospheric conditions obtained using an automatic test and measuring system developed by the authors. The characterization of the CPV module is carried out considering incident normal irradiance, ambient temperature, spectral irradiance distribution and wind speed. CO2RBFN, a cooperative-competitive algorithm for the design of radial basis neural networks, is adapted and applied to these data obtaining a model with a good level of accuracy on test data, improving the results obtained by other methods considered in the experimental comparison. These results are promising and the obtained model could be used to work out the maximum power at the CPV reporting conditions and to analyze the performance of the module under any conditions and at any moment.