Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

被引:57
|
作者
Tyagi, Shivani [1 ]
Mittal, Sangeeta [1 ]
机构
[1] Jaypee Inst Informat Technol Noida, Dept Comp Sci & Engn, Noida, UP, India
关键词
Imbalanced dataset; Machine learning; Resampling; Undersampling; Oversampling;
D O I
10.1007/978-3-030-29407-6_17
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversamplingminority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.
引用
收藏
页码:209 / 221
页数:13
相关论文
共 50 条
  • [1] Classification of imbalanced medical data: An empirical study of machine learning approaches
    Mundra, Shikha
    Vijay, Shounak
    Mundra, Ankit
    Gupta, Punit
    Goyal, Mayank Kumar
    Kaur, Mandeep
    Khaitan, Supriya
    Rajpoot, Abha Kiran
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 1933 - 1946
  • [2] Imbalanced Data Problem in Machine Learning: A Review
    Altalhan, Manahel
    Algarni, Abdulmohsen
    Turki-Hadj Alouane, Monia
    IEEE Access, 2025, 13 : 13686 - 13699
  • [3] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [4] Adversarial Approaches to Tackle Imbalanced Data in Machine Learning
    Ayoub, Shahnawaz
    Gulzar, Yonis
    Rustamov, Jaloliddin
    Jabbari, Abdoh
    Reegu, Faheem Ahmad
    Turaev, Sherzod
    SUSTAINABILITY, 2023, 15 (09)
  • [5] An Improved Extreme Learning Machine for Imbalanced Data Classification
    Zhang, Xiaopeng
    Qin, Liangxi
    IEEE ACCESS, 2022, 10 : 8634 - 8642
  • [6] Imbalanced data classification: Using transfer learning and active sampling
    Liu, Yang
    Yang, Guoping
    Qiao, Shaojie
    Liu, Meiqi
    Qu, Lulu
    Han, Nan
    Wu, Tao
    Yuan, Guan
    Peng, Yuzhong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [7] Comparison Of The Different Sampling Techniques For Imbalanced Classification Problems In Machine Learning
    Peng Zhihao
    Yan Fenglong
    Li Xucheng
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 431 - 434
  • [8] Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem
    Liu, Yi-Hsun
    Liu, Chien-Liang
    Tseng, Vincent Shin-Mu
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1146 - 1151
  • [9] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [10] IMBALANCED DATA CLASSIFICATION BASED ON EXTREME LEARNING MACHINE AUTOENCODER
    Shen, Chu
    Zhang, Su-Fang
    Zhai, Jun-Hal
    Luo, Ding-Sheng
    Chen, Jun-Fen
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2018, : 399 - 404