The Proposal of Undersampling Method for Learning from Imbalanced Datasets

被引:38
|
作者
Bach, Malgorzata [1 ]
Werner, Aleksandra [1 ]
Palt, Mateusz [1 ]
机构
[1] Silesian Tech Univ, Gliwice, Poland
关键词
classification; imbalanced dataset; sampling methods;
D O I
10.1016/j.procs.2019.09.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Highly imbalanced data, which occurs in many real-world applications, often makes machine-based processing difficult or even impossible. The over- and under-sampling methods help to tackle this issue, however they often have serious shortcomings. In this paper different methods of class balancing, especially those obtained by undersampling, are analyzed. Besides, a new solution is presented. The method is oriented toward finding and thinning clusters of majority class examples. Removing observations from high-density areas can lead to a less loss of information than in the case of removing individual examples or these from less-density areas. Such approach makes the distribution of examples more even. The effectiveness of the method is demonstrated through extensive comparisons to other undersampling methods with the use of eighteen public datasets. The results of experiments show that in many cases the proposed solution allows to achieve better performance than other tested techniques. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 50 条
  • [31] PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data
    Jeon, Yong-Seok
    Lim, Dong-Joon
    IEEE ACCESS, 2020, 8 : 131920 - 131927
  • [32] Drilling Condition Identification Method for Imbalanced Datasets
    Yu, Yibing
    Yang, Huilin
    Peng, Fengjia
    Wang, Xi
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [33] A Novel Intrusion Detection Method for Imbalanced Datasets
    Li, Qiang
    Fu, Yanfang
    Cao, Zijian
    Du, Zhiqiang
    Zhang, Qizhe
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2025, 15
  • [34] A hybrid evolutionary preprocessing method for imbalanced datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    INFORMATION SCIENCES, 2018, 454 : 161 - 177
  • [35] A New Method of Text Categorization on Imbalanced Datasets
    Li Xin-fu
    Yu Yan
    Yin Peng
    2008 INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND TRAINING AND 2008 INTERNATIONAL WORKSHOP ON GEOSCIENCE AND REMOTE SENSING, VOL 2, PROCEEDINGS,, 2009, : 259 - 262
  • [36] DYCUSBoost: Adaboost-based imbalanced learning using dynamic clustering and undersampling
    Chen, Lingchi
    Deng, Xiaoheng
    Shen, Hailan
    Zhu, Congxu
    Chang, Le
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 208 - 215
  • [37] Adaptive Ensemble Undersampling-Boost: A novel learning framework for imbalanced data
    Lu, Wei
    Li, Zhe
    Chu, Jinghui
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 132 : 272 - 282
  • [38] Interpretable machine learning for imbalanced credit scoring datasets
    Chen, Yujia
    Calabrese, Raffaella
    Martin-Barragan, Belen
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
  • [39] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [40] Minority Class Oriented Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9920 - 9927