A Dynamic Sampling Framework for Multi-Class Imbalanced Data

被引:8
|
作者
Debowski, B. [1 ]
Areibi, S. [1 ]
Grewal, G. [2 ]
Tempelman, J. [2 ]
机构
[1] Univ Guelph, Sch Engn, Guelph, ON, Canada
[2] Univ Guelph, Sch Comp Sci, Guelph, ON, Canada
关键词
Multi-class; Imbalanced Data; Dynamic Sampling;
D O I
10.1109/ICMLA.2012.144
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating the need to specify a target distribution and automatically tuning the training set distribution to the classification algorithm's learning preferences. This is important when re-sampling multi-class data where manually searching for an appropriate target distribution would be a daunting task. We test both our Dynamic Sampling approach and traditional Static Sampling using RUS, ROS, SMOTE, ROS+RUS, and SMOTE+RUS with several classification algorithms on a four class, highly imbalanced data set. We compare the results of Static Sampling and Dynamic Sampling and find that overall both techniques are able to raise Recall for the highest minority classes, but Dynamic Sampling is also able to maintain or raise Recall for the majority classes. Also, Dynamic Sampling is overall more robust and resilient, and is better able to sustain classifier Accuracy and to raise G-Mean and Minimum F-Measures.
引用
收藏
页码:113 / 118
页数:6
相关论文
共 50 条
  • [1] A Partial Labeling Framework for Multi-Class Imbalanced Streaming Data
    Arabmakki, Elaheh
    Kantardzic, Mehmed
    Sethi, Tegjyot Singh
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1018 - 1025
  • [2] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [3] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    [J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614
  • [4] Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [5] MULTI-CLASS DATA CLASSIFICATION FOR IMBALANCED DATA SET USING COMBINED SAMPLING APPROACHES
    Prachuabsupakij, Wanthanee
    Snonthornphisaj, Nuanwan
    [J]. KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 166 - 171
  • [6] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322
  • [7] Survey on Highly Imbalanced Multi-class Data
    Hamid, Hakim Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 211 - 229
  • [8] DynaQ: online learning from imbalanced multi-class streams through dynamic sampling
    Sadeghi, Farnaz
    Viktor, Herna L.
    Vafaie, Parsa
    [J]. APPLIED INTELLIGENCE, 2023, 53 (21) : 24908 - 24930
  • [9] DynaQ: online learning from imbalanced multi-class streams through dynamic sampling
    Farnaz Sadeghi
    Herna L. Viktor
    Parsa Vafaie
    [J]. Applied Intelligence, 2023, 53 : 24908 - 24930
  • [10] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,