NUK - logo
E-viri
Celotno besedilo
Recenzirano
  • ACOSampling: An ant colony ...
    Yu, Hualong; Ni, Jun; Zhao, Jing

    Neurocomputing (Amsterdam), 02/2013, Letnik: 101
    Journal Article

    In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling method based on the idea of ant colony optimization (ACO) to address this problem. The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we randomly and repeatedly divided the original training set into two groups: training set and validation set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter less informative majority samples and search the corresponding optimal training sample subset. At last, the statistical results from all local optimal training sample subsets are given in the form of frequence list, where each frequence indicates the importance of the corresponding majority sample. We only extracted those high frequency ones and combined them with all minority samples to construct the final balanced training set. We evaluated the method on four benchmark skewed DNA microarray datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms many other sampling approaches, which indicates its superiority. ► ACO algorithm is modified for undersampling skewed DNA microarray data. ► The significance of each majority sample is estimated by ranking frequence list. ► ACOSampling increases classification performance but spends more time. ► Selecting a few feature genes helps to improve classification performance. ► Some classification tasks are harmful and the others are unharmful.