A Classification Method Based on Feature Selection for Imbalanced Data

被引:32
|
作者
Liu, Yi [1 ,2 ]
Wang, Yanzhen [1 ,2 ,3 ]
Ren, Xiaoguang [1 ,2 ]
Zhou, Hao [1 ,2 ]
Diao, Xingchun [1 ,2 ]
机构
[1] Natl Innovat Inst Def Technol, Beijing 100010, Peoples R China
[2] Tianjin Artificial Intelligence Innovat Ctr, Tianjin 300457, Peoples R China
[3] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Hunan, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Feature selection; imbalanced data; multiobjective ant colony optimization; genetic algorithm; ENSEMBLE; INSTANCE; INSIGHT;
D O I
10.1109/ACCESS.2019.2923846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data are very common in the real world, and it may deteriorate the performance of the conventional classification algorithms. In order to resolve the imbalanced classification problems, we propose an ensemble classification method that combines evolutionary under-sampling and feature selection. We employ the Bootstrap method in original data to generate many sample subsets. V-statistic is developed to measure the distribution of imbalanced data, and it is also taken as the optimization objective of the genetic algorithm for the under-sampling sample subsets. Moreover, we take F-1 and Gmean indicators as two optimization objectives and employ the multiobjective ant colony optimization algorithm for feature selection of resampled data to construct an ensemble system. Ten low-dimensional and four high-dimensional typical imbalanced datasets are used in experiments. The six state-of-the-art algorithms and four measures are taken for a fair comparison. The experimental results show that our proposed system has a better classification performance compared with other algorithms, especially for the high-dimensional imbalanced data.
引用
收藏
页码:81794 / 81807
页数:14
相关论文
共 50 条
  • [1] An Embedded Feature Selection Method for Imbalanced Data Classification
    Liu, Haoyue
    Zhou, MengChu
    Liu, Qing
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
  • [2] An Embedded Feature Selection Method for Imbalanced Data Classification
    Haoyue Liu
    MengChu Zhou
    Qing Liu
    [J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (03) : 703 - 715
  • [3] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [4] A feature selection method to handle imbalanced data in text classification
    Chang, Fengxiang
    Guo, Jun
    Xu, Weiran
    Yao, Kejun
    [J]. Journal of Digital Information Management, 2015, 13 (03): : 169 - 175
  • [5] Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data
    Li, Kewen
    Yu, Mingxiao
    Liu, Lu
    Li, Timing
    Zhai, Jiannan
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (08) : 1177 - 1194
  • [6] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    [J]. Annals of Data Science, 2023, 10 (06) : 1527 - 1541
  • [7] A Feature Selection Model for Binary Classification of Imbalanced Data Based on Preference for Target Instances
    Tan, Ding-Wen
    Liew, Soung-Yue
    Tan, Teik-Boon
    Yeoh, William
    [J]. 2012 4TH CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2012, : 35 - 42
  • [8] Default forecasting based on a novel group feature selection method for imbalanced data
    Chi, Guotai
    Xing, Jin
    Pan, Ancheng
    [J]. JOURNAL OF CREDIT RISK, 2023, 19 (03): : 51 - 77
  • [9] FEATURE SELECTION AND CLASSIFICATION INTEGRATED METHOD FOR IDENTIFYING CITED TEXT SPANS FOR CITANCES ON IMBALANCED DATA
    Yee, Jen-Yuan
    Tsai, Cheng-Jung
    Hsu, Tien-Yu
    Lin, Jung-Yi
    Cheng, Pei-Cheng
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2021, 34 (04) : 355 - 373
  • [10] Imbalanced Network Traffic Classification based on Ensemble Feature Selection
    Ding, Yaojun
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2016,