The Proposal of Undersampling Method for Learning from Imbalanced Datasets

被引:38
|
作者
Bach, Malgorzata [1 ]
Werner, Aleksandra [1 ]
Palt, Mateusz [1 ]
机构
[1] Silesian Tech Univ, Gliwice, Poland
关键词
classification; imbalanced dataset; sampling methods;
D O I
10.1016/j.procs.2019.09.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Highly imbalanced data, which occurs in many real-world applications, often makes machine-based processing difficult or even impossible. The over- and under-sampling methods help to tackle this issue, however they often have serious shortcomings. In this paper different methods of class balancing, especially those obtained by undersampling, are analyzed. Besides, a new solution is presented. The method is oriented toward finding and thinning clusters of majority class examples. Removing observations from high-density areas can lead to a less loss of information than in the case of removing individual examples or these from less-density areas. Such approach makes the distribution of examples more even. The effectiveness of the method is demonstrated through extensive comparisons to other undersampling methods with the use of eighteen public datasets. The results of experiments show that in many cases the proposed solution allows to achieve better performance than other tested techniques. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 50 条
  • [1] GUM: A Guided Undersampling Method to Preprocess Imbalanced Datasets for Classification
    Sung, Kisuk
    Brown, W. Eric
    Moreno-Centeno, Erick
    Ding, Yu
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1086 - 1091
  • [2] Exploiting Prototypical Explanations for Undersampling Imbalanced Datasets
    Arslan, Yusuf
    Allix, Kevin
    Lefebvre, Clement
    Boytsov, Andrey
    Bissyand, Tegawende F.
    Klein, Jacques
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1449 - 1454
  • [3] Combination of Oversampling and Undersampling Techniques on Imbalanced Datasets
    Bansal, Ankita
    Verma, Ayush
    Singh, Sarabjot
    Jain, Yashonam
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 3, 2023, 492 : 647 - 656
  • [4] LDAMSS: Fast and efficient undersampling method for imbalanced learning
    Liang, Ting
    Xu, Jie
    Zou, Bin
    Wang, Zhan
    Zeng, Jingjing
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6794 - 6811
  • [5] LDAMSS: Fast and efficient undersampling method for imbalanced learning
    Ting Liang
    Jie Xu
    Bin Zou
    Zhan Wang
    Jingjing Zeng
    Applied Intelligence, 2022, 52 : 6794 - 6811
  • [6] Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
    Garcia, Salvador
    Herrera, Francisco
    EVOLUTIONARY COMPUTATION, 2009, 17 (03) : 275 - 306
  • [7] An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    IEEE ACCESS, 2023, 11 : 136782 - 136792
  • [8] Evidential Undersampling Approach for Imbalanced Datasets with Class-Overlapping and Noise
    Grina, Fares
    Elouedi, Zied
    Lefevre, Eric
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2021), 2021, 12898 : 181 - 192
  • [9] Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
    Veni, C. V. Krishna
    Rani, T. Sobha
    2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 121 - 126
  • [10] Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1417 - 1426