Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

被引:57
|
作者
Tyagi, Shivani [1 ]
Mittal, Sangeeta [1 ]
机构
[1] Jaypee Inst Informat Technol Noida, Dept Comp Sci & Engn, Noida, UP, India
关键词
Imbalanced dataset; Machine learning; Resampling; Undersampling; Oversampling;
D O I
10.1007/978-3-030-29407-6_17
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversamplingminority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.
引用
收藏
页码:209 / 221
页数:13
相关论文
共 50 条
  • [41] Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution
    Yong Zhang
    Bo Liu
    Jing Cai
    Suhua Zhang
    Neural Computing and Applications, 2017, 28 : 259 - 267
  • [43] Imbalanced Classification in Diabetics Using Ensembled Machine Learning
    Kumar, M. Sandeep
    Khan, Mohammad Zubair
    Rajendran, Sukumar
    Noor, Ayman
    Dass, A. Stephen
    Prabhu, J.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (03): : 4397 - 4409
  • [44] A transfer weighted extreme learning machine for imbalanced classification
    Guo, Yinan
    Jiao, Botao
    Tan, Ying
    Zhang, Pei
    Tang, Fengzhen
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (10) : 7685 - 7705
  • [45] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [46] Imbalanced classification by learning hidden data structure
    Zhao, Yang
    Shrivastava, Abhishek K.
    Tsui, Kwok Leung
    IIE TRANSACTIONS, 2016, 48 (07) : 614 - 628
  • [47] Machine Learning on Imbalanced Data in Credit Risk
    Birla, Shiivong
    Kohli, Kashish
    Dutta, Akash
    7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016, 2016,
  • [48] Dynamic Curriculum Learning for Imbalanced Data Classification
    Wang, Yiru
    Gan, Weihao
    Yang, Jie
    Wu, Wei
    Yan, Junjie
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5016 - 5025
  • [49] Deep Learning for Imbalanced Multimedia Data Classification
    Yan, Yilin
    Chen, Min
    Shyu, Mei-Ling
    Chen, Shu-Ching
    2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 483 - 488
  • [50] An Improved Ensemble Learning for Imbalanced Data Classification
    Yuan, Zhengwu
    Zhao, Pu
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 408 - 411