Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets

被引:36
|
作者
Aridas, Christos K. [1 ]
Karlos, Stamatis [1 ]
Kanas, Vasileios G. [2 ]
Fazakis, Nikos [2 ]
Kotsiantis, Sotiris B. [1 ]
机构
[1] Univ Patras, Dept Math, Patras 26504, Greece
[2] Univ Patras, Dept Elect & Comp Engn, Patras 26504, Greece
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Active selection; classification; naive bayes; imbalanced data; under-sampling; SMOTE; CLASSIFICATION; IDENTIFICATION; CHALLENGES; MACHINE; INSIGHT;
D O I
10.1109/ACCESS.2019.2961784
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many real world classification tasks, all data classes are not represented equally. This problem, known also as the curse of class imbalanced in data sets, has a potential impact in the training procedure of a classifier by learning a model that will be biased in favor of the majority class. In this work at hand, an under-sampling approach is proposed, which leverages the usage of a Naive Bayes classifier, in order to select the most informative instances from the available training set, based on a random initial selection. The method starts by learning a Naive Bayes classification model on a small stratified initial training set. Afterwards, it iteratively teaches its base model with the instances that the model is most uncertain about and retrains it until some criteria are satisfied. The overall performance of the proposed method has been scrutinized through a rigorous experimental procedure, being tested using six multimodal data sets, as well as another forty-four standard benchmark data sets. The empirical results indicate that the proposed under-sampling method achieves comparable classification performance in contrast to other resampling techniques, regarding several proper metrics and having performed a suitable statistical testing procedure.
引用
收藏
页码:2122 / 2133
页数:12
相关论文
共 50 条
  • [1] An Under-sampling Imbalanced Learning of Data Gravitation Based Classification
    Peng, Lizhi
    Yang, Bo
    Chen, Yuehui
    Zhou, Xiaoqing
    [J]. 2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 419 - 425
  • [2] A Meta-Learning Method to Select Under-Sampling Algorithms for Imbalanced Data Sets
    de Morais, Romero F. A. B.
    Miranda, Pericles B. C.
    Silva, Ricardo M. A.
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 385 - 390
  • [3] Ensemble based on feature projection and under-sampling for imbalanced learning
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-an
    She, Wei
    Xu, Mingliang
    [J]. INTELLIGENT DATA ANALYSIS, 2018, 22 (05) : 959 - 980
  • [4] EVOLUTIONARY-BASED ENSEMBLE UNDER-SAMPLING FOR IMBALANCED DATA
    Zhang, Yongqing
    Lu, Rongzhao
    Huang, Ji
    Gao, Dongrui
    [J]. 2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 212 - 216
  • [5] Under-sampling method based on sample weight for imbalanced data
    Xiong B.
    Wang G.
    Deng W.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11): : 2613 - 2622
  • [6] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273
  • [7] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    [J]. 2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [8] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [9] Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning
    Lin, ZhiYong
    Hao, ZhiFeng
    Yang, XiaoWei
    Liu, XiaoLan
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 536 - +
  • [10] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408