The Proposal of Undersampling Method for Learning from Imbalanced Datasets

被引:38
|
作者
Bach, Malgorzata [1 ]
Werner, Aleksandra [1 ]
Palt, Mateusz [1 ]
机构
[1] Silesian Tech Univ, Gliwice, Poland
关键词
classification; imbalanced dataset; sampling methods;
D O I
10.1016/j.procs.2019.09.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Highly imbalanced data, which occurs in many real-world applications, often makes machine-based processing difficult or even impossible. The over- and under-sampling methods help to tackle this issue, however they often have serious shortcomings. In this paper different methods of class balancing, especially those obtained by undersampling, are analyzed. Besides, a new solution is presented. The method is oriented toward finding and thinning clusters of majority class examples. Removing observations from high-density areas can lead to a less loss of information than in the case of removing individual examples or these from less-density areas. Such approach makes the distribution of examples more even. The effectiveness of the method is demonstrated through extensive comparisons to other undersampling methods with the use of eighteen public datasets. The results of experiments show that in many cases the proposed solution allows to achieve better performance than other tested techniques. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 50 条
  • [41] Deep Learning Applied to Imbalanced Malware Datasets Classification
    Salas, Marcelo Palma
    de Geus, Paulo Licio
    JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2024, 15 (01) : 342 - 359
  • [42] A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance
    Sundarkumar, G. Ganesh
    Ravi, Vadlamani
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 37 : 368 - 377
  • [43] Modifying the learning rate of FLNG dealing with imbalanced datasets
    Machon-Gonzalez, Ivan
    Lopez-Garcia, Hilario
    Luis Calvo-Rolle, Jose
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [44] RSG: A Simple but Effective Module for Learning Imbalanced Datasets
    Wang, Jianfeng
    Lukasiewicz, Thomas
    Hu, Xiaolin
    Cai, Jianfei
    Xu, Zhenghua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3783 - 3792
  • [45] Prediction of toxicity: Deep learning with small and imbalanced datasets
    Ecker, Gerhard
    Hemmerich, Jennifer
    Asilar, Ece
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [46] A GENETIC RULE LEARNING APPROACH TO DEAL WITH IMBALANCED DATASETS
    Mahani, Aouatef
    Benkhider, Sadjia
    Baba-Ali, Ahmed Riadh
    PROCEEDINGS OF THE EUROPEAN CONFERENCE ON DATA MINING 2015 AND INTERNATIONAL CONFERENCES ON INTELLIGENT SYSTEMS AND AGENTS 2015 AND THEORY AND PRACTICE IN MODERN COMPUTING 2015, 2015, : 151 - 156
  • [47] A novel progressively undersampling method based on the density peaks sequence for imbalanced data
    Xie, Xiaoying
    Liu, Huawen
    Zeng, Shouzhen
    Lin, Lingbin
    Li, Wen
    KNOWLEDGE-BASED SYSTEMS, 2021, 213
  • [48] Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Maria Perez, Jesus
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 111 - 120
  • [49] A Spammer Identification Method for Class Imbalanced Weibo Datasets
    Tang, Wenbing
    Ding, Zuohua
    Zhou, Mengchu
    IEEE ACCESS, 2019, 7 : 29193 - 29201
  • [50] Gradually Generative Adversarial Networks Method for Imbalanced Datasets
    Misdram, Muhammad
    Muljono
    Purwanto
    Noersasongko, Edi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 51 - 58