Toward Optimal Feature Selection in Naive Bayes for Text Categorization

E-viri

PDF

Celotno besedilo

Recenzirano Odprti dostop

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Tang, Bo; Kay, Steven; He, Haibo

IEEE transactions on knowledge and data engineering, 2016-Sept.-1, 2016-9-1, 20160901, Letnik: 28, Številka: 9

Journal Article

Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (<inline-formula><tex-math notation="LaTeX">MD</tex-math> <inline-graphic xlink:type="simple" xlink:href="he-ieq1-2563436.gif"/> </inline-formula>) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.

Išči dalje

Avtor

Tang, Bo | Kay, Steven | He, Haibo

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema