DIKUL - logo
E-resources
Full text
Peer reviewed
  • MGFS: A multi-label graph-b...
    Hashemi, Amin; Dowlatshahi, Mohammad Bagher; Nezamabadi-pour, Hossein

    Expert systems with applications, 03/2020, Volume: 142
    Journal Article

    •We have proposed a fast algorithm for feature selection on the multi-label data.•Features that discriminate classes are linked to provide an undirected weighted graph.•Features relationships are defined based on correlation distance with labels.•PageRank algorithm ranks the features according to their importance in weighted graph.•The proposed multi-label graph based method outperforms competitive methods. In multi-label data, each instance corresponds to a set of labels instead of one label whereby the instances belonging to a label in the corresponding column of that label are assigned 1, while instances that do not belong to that label are assigned 0 in the data set. This type of data is usually considered as high-dimensional data, so many methods, using machine learning algorithms, seek to choose the best subset of features for reducing the dimensionality of data and then to create an acceptable model for classification. In this paper, we have designed a fast algorithm for feature selection on the multi-label data using the PageRank algorithm, which is an effective method used to calculate the importance of web pages on the Internet. This algorithm, which is called multi-label graph-based feature selection (MGFS), first constructs an M × L matrix, called Correlation Distance Matrix (CDM), where M is the number of features and L represents the number of class labels. Then, MGFS creates a complete weighted graph, called Feature-Label Graph (FLG), where each feature is considered as a vertex, and the weight between two vertices (or features) represents their Euclidean distance in CDM. Finally, the importance of each graph vertex (or feature) is estimated via the PageRank algorithm. In the proposed method, the number of features can be determined by the user. To prove the performance of the proposed algorithm, we have tested this algorithm with several methods for multi-label feature selection and on several multi-label datasets with different dimensions. The results show the superiority of the proposed method in the classification criteria and run-time.