Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

被引:57
|
作者
Tyagi, Shivani [1 ]
Mittal, Sangeeta [1 ]
机构
[1] Jaypee Inst Informat Technol Noida, Dept Comp Sci & Engn, Noida, UP, India
关键词
Imbalanced dataset; Machine learning; Resampling; Undersampling; Oversampling;
D O I
10.1007/978-3-030-29407-6_17
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversamplingminority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.
引用
收藏
页码:209 / 221
页数:13
相关论文
共 50 条
  • [31] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [32] Over-sampling algorithm for imbalanced data classification
    XU Xiaolong
    CHEN Wen
    SUN Yanfei
    Journal of Systems Engineering and Electronics, 2019, 30 (06) : 1182 - 1191
  • [33] An empirical evaluation of sampling methods for the classification of imbalanced data
    Kim, Misuk
    Hwang, Kyu-Baek
    PLOS ONE, 2022, 17 (07):
  • [34] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202
  • [35] On the class overlap problem in imbalanced data classification
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    Petrovski, Andrei
    KNOWLEDGE-BASED SYSTEMS, 2021, 212 (212)
  • [36] Sampling technique for noisy and borderline examples problem in imbalanced classification
    Dixit, Abhishek
    Mani, Ashish
    APPLIED SOFT COMPUTING, 2023, 142
  • [37] Severely imbalanced Big Data challenges: investigating data sampling approaches
    Tawfiq Hasanin
    Taghi M. Khoshgoftaar
    Joffrey L. Leevy
    Richard A. Bauder
    Journal of Big Data, 6
  • [38] Severely imbalanced Big Data challenges: investigating data sampling approaches
    Hasanin, Tawfiq
    Khoshgoftaar, Taghi M.
    Leevy, Joffrey L.
    Bauder, Richard A.
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [39] GIR-based ensemble sampling approaches for imbalanced learning
    Tang, Bo
    He, Haibo
    PATTERN RECOGNITION, 2017, 71 : 306 - 319
  • [40] Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution
    Zhang, Yong
    Liu, Bo
    Cai, Jing
    Zhang, Suhua
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S259 - S267