NUK - logo
E-viri
Recenzirano Odprti dostop
  • Toward Optimal Feature Sele...
    Tang, Bo; Kay, Steven; He, Haibo

    IEEE transactions on knowledge and data engineering, 2016-Sept.-1, 2016-9-1, 20160901, Letnik: 28, Številka: 9
    Journal Article

    Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (<inline-formula><tex-math notation="LaTeX">MD</tex-math> <inline-graphic xlink:type="simple" xlink:href="he-ieq1-2563436.gif"/> </inline-formula>) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.