•Two new feature selection methods are proposed based on joint mutual information.•The methods use joint mutual information with maximum of the minimum criterion.•The methods address the problem of ...selection of redundant and irrelevant features.•The methods are evaluated using eleven public data sets and five competing methods.•The proposed JMIM method outperforms five competing methods in terms of accuracy.
Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency, scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance.
To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the ‘maximum of the minimum’ criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover, this method produces the best trade-off between accuracy and stability.
Streaming feature selection (SFS), is the task of selecting the most informative features in dealing with high-dimensional or incrementally growing problems. Several SFS algorithms have been proposed ...in the literature. However, they do not consider all feature subsets at the redundancy analysis step due to computational concerns. Moreover, they do not reconsider previously removed features which leads to losing most of the useful information. In this paper, the redundancy analysis step is defined as a binary optimization problem. Then, a binary bat algorithm (BBA) is adopted to find the minimal informative subsets. In this way, a large number of feature subsets can be considered effectively at the redundancy analysis step. In addition, an effective priority list is used to maintain previously removed redundant features. Such a list allows the re-examination of informative features. As a result, it is possible to consider the mutual information between features that are not streamed in an small time interval. Experimental studies on fifteen different types of datasets show that our approach is superior to state-of-the-art online and offline streaming feature selection methods in terms of classification accuracy.
Cardiovascular diseases (CVD) are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. ...Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).
A review of unsupervised feature selection methods Solorio-Fernández Saúl; Ariel, Carrasco-Ochoa J; Martínez-Trinidad, José Fco
The Artificial intelligence review,
02/2020, Letnik:
53, Številka:
2
Journal Article
Recenzirano
In recent years, unsupervised feature selection methods have raised considerable interest in many research areas; this is mainly due to their ability to identify and select relevant features without ...needing class label information. In this paper, we provide a comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature. We present a taxonomy of these methods and describe the main characteristics and the fundamental ideas they are based on. Additionally, we summarized the advantages and disadvantages of the general lines in which we have categorized the methods analyzed in this review. Moreover, an experimental comparison among the most representative methods of each approach is also presented. Finally, we discuss some important open challenges in this research area.
Because of the overgrowth of data, especially in text format, the value and importance of multi‐label text classification have increased. Aside from this, preprocessing and particularly intelligent ...feature selection (FS) are the most important step in classification. Each FS finds the best features based on its approach, but we try to use a multi‐strategy approach to find more useful features. Evaluating and comparing features’ importance and relevance makes using multiple strategy and methods more suitable than conventional approaches because each feature is measured based on several perspectives. Nevertheless, the ensemble FS merges the final performance results of various methods to take advantage of different methods’ strengths and better classify. In this article, we have proposed an ensemble FS method for multi‐label text data (MLTD) for the first time using the order statistics (EMFS) approach. We have utilized four multi‐label FS (MLFS) algorithms with various particular performances to achieve a good result. In this method, as one of the most important statistics methods, Order Statistics was used to aggregate the ranks of different algorithms, which is robust against noise, redundant and inessential features. In the end, the performance of EMFS, executing six MLTDs, was evaluated according to six performance criteria (ranking‐based and classification‐based). Surprisingly, the proposed method was more accurate than others among all used MLTDs. The proposed method has improved by 1.5% compared to other methods. This value is based on the results obtained based on six evaluation criteria and all tested data sets.
With the rapid increase of the data size, it has increasing demands for selecting features by exploiting both labeled and unlabeled data. In this paper, we propose a novel semi-supervised embedded ...feature selection method. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which is used for evaluating the importance of features. An iterative algorithm is proposed to optimize the new model. It has been proved that solving the new model is equivalent to solving a sparse model with a flexible and adaptable ℓ 2;p norm regularization. Moreover, the optimal solution of scale factors provides a theoretical explanation for why we can use {||w 1 || 2 , . . .,||w d || 2 } to evaluate the importance of features. Experimental results on eight benchmark data sets show the superior performance of the proposed method.
Machine Learning and Data Mining algorithms are used extensively to enhance the performance of Intrusion Detection Systems. The number of training instances and the dimensionality of data are crucial ...factors affecting the performance of the model built during the training of any supervised learning algorithms. A sufficient proportion of instances having relevant features from all classes of attacks and normal traffic are considered most desirable while building the classification model that classifies the network traffic into attack and normal. This paper proposes a methodology to improve the accuracy of the model by giving importance to the relevant features that can contribute to model building. The feature selection using correlation‐based and information gain‐based techniques during training and testing contributes much to the detection of stealthier attacks and minority attacks. Then the features of the less detected attacks are identified as the second phase of the filter that is used to improve the performance. The relevant features of stealthy attacks are identified based on the correlation of corresponding features of the attack and normal data as the attacks are made stealthy mostly by making it resemble the normal traffic. Finally, the attacks that are rarely found in the training data are oversampled to improve their detection. CICIDS 2017 data set is employed as it comprises stealthier attacks generated using modern tools. NSL KDD data set is also used for evaluation to compare the proposed work with existing literature as it is used in most of the available literature. The results show superior performance with an accuracy of 99.8%, false positive rate of 0.2%, and a detection rate and 99.8%.
Feature selection refers to the problem of finding the optimal subset of features by removing irrelevant and redundant features to improve classification accuracy. The determination of the most ...effective distance measures to evaluate the relevance and redundancy of features has not been investigated precisely to date. Moreover, the relation between relevancy and redundancy is still uncertain. This paper presents a novel relevancy-redundancy measurement based on distance applying the idea of the mRMR criteria to an unsupervised method. In addition, a supervised method is proposed, in which the features are ranked in terms of the distance between each pair of samples in different classes of the feature vector. Then an ensemble of the proposed supervised and unsupervised methods is applied to choose the most relevant features subset. This study investigates and compares the effects of 24 distance measures selected from five major families of distance functions on the performance of the proposed feature selection methods. The highest-ranked features are selected using an empirically achieved threshold. To evaluate the selected features, three classifiers, i.e., Decision Tree, Support Vector Machine and Naive Bayes were applied to biomedical datasets representing binary problems from the UCI data repository. The experimental results demonstrate the superiority of the proposed methods over the state-of-the-art and also classical feature selection ones in terms of improving stability, classification accuracy, Recall (Sensitivity), Precision, F-measure, and Specificity.
•Proposing formulas for distance-based supervised and unsupervised feature selection•Applying an ensemble of the proposed methods to select the best subset of features•Evaluating the effectiveness of 24 distance measures on the proposed methods•Maximizing relevancy and minimizing redundancy of features in unsupervised process•Investigating high-dimensional datasets proves the efficiency of the proposed methods
Online streaming feature selection, as a new approach which deals with feature streams in an online manner, has attracted much attention in recent years and played a critical role in dealing with ...high-dimensional problems. However, most of the existing online streaming feature selection methods need the domain information before learning and specifying the parameters in advance. It is hence a challenge to select unified and optimal parameters before learning for all different types of data sets. In this paper, we define a new Neighborhood Rough Set relation with adapted neighbors named the Gap relation and propose a new online streaming feature selection method based on this relation, named OFS-A3M. OFS-A3M does not require any domain knowledge and does not need to specify any parameters in advance. With the “maximal-dependency, maximal-relevance and maximal-significance” evaluation criteria, OFS-A3M can select features with high correlation, high dependency and low redundancy. Experimental studies on fifteen different types of data sets show that OFS-A3M is superior to traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.
Nowadays, with emerge the multi-label datasets, the multi-label learning processes attracted interest and increasingly applied to different fields. In such learning processes, unlike single-label ...learning, instances have more than one class label simultaneously. Also, multi-label learning suffers from the curse of dimensionality, and thus, the feature selection becomes a difficult task. In this paper, we propose a novel multi-label relevance–redundancy feature selection method based on Ant colony optimization (ACO) for the first time, called MLACO. By introducing two unsupervised and supervised heuristic functions, MLACO tries to search in the features space to find the most promising features with the lowest redundancy (unsupervised) and highest relevancy with class labels (supervised) through several iterations. For speeding up the convergence of the algorithm, the normalized cosine similarity between features and class labels have been used as the initial pheromone of each ant. The proposed method does not take into account any learning algorithm, and it can be classified as a filter-based method. We compare the performance of the MLACO against five well-known and state-of-the-art feature selection methods using ML-KNN classifier. The experimental results on several frequently used datasets show the superiority of the MLACO in different multi-label evaluation measures criteria and runtime.