The Proposal of Undersampling Method for Learning from Imbalanced Datasets

被引:38
|
作者
Bach, Malgorzata [1 ]
Werner, Aleksandra [1 ]
Palt, Mateusz [1 ]
机构
[1] Silesian Tech Univ, Gliwice, Poland
关键词
classification; imbalanced dataset; sampling methods;
D O I
10.1016/j.procs.2019.09.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Highly imbalanced data, which occurs in many real-world applications, often makes machine-based processing difficult or even impossible. The over- and under-sampling methods help to tackle this issue, however they often have serious shortcomings. In this paper different methods of class balancing, especially those obtained by undersampling, are analyzed. Besides, a new solution is presented. The method is oriented toward finding and thinning clusters of majority class examples. Removing observations from high-density areas can lead to a less loss of information than in the case of removing individual examples or these from less-density areas. Such approach makes the distribution of examples more even. The effectiveness of the method is demonstrated through extensive comparisons to other undersampling methods with the use of eighteen public datasets. The results of experiments show that in many cases the proposed solution allows to achieve better performance than other tested techniques. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 50 条
  • [21] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    SCIENTIFIC PROGRAMMING, 2019, 2019
  • [22] The method of text categorization on imbalanced datasets
    Li Xin-fu
    Yu Yan
    Yin Peng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 650 - 653
  • [23] Undersampling method based on minority class density for imbalanced data
    Sun, Zhongqiang
    Ying, Wenhao
    Zhang, Wenjin
    Gong, Shengrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [24] A Discriminative Dictionary Learning-AdaBoost-SVM Classification Method on Imbalanced Datasets
    Barstugan, Mucahid
    Ceylan, Rahime
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [25] Iranian Cancer Patient Detection Using a New Method for Learning at Imbalanced Datasets
    Parvin, Hamid
    Minaei-Bidgoli, Behrouz
    Alizadeh, Hosein
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 299 - 306
  • [26] Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
    Liu, Yang
    Yu, Xiaohui
    Huang, Jimmy Xiangji
    An, Aijun
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (04) : 617 - 631
  • [27] LEARNING IMBALANCED DATASETS WITH MAXIMUM MARGIN LOSS
    Kang, Haeyong
    Vu, Thang
    Yoo, Chang D.
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1269 - 1273
  • [28] A Hybrid Machine Learning Methodology for Imbalanced Datasets
    Lipitakis, Anastasia-Dimitra
    Kotsiantis, Sotirios
    5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
  • [29] Distribution-Sensitive Learning for Imbalanced Datasets
    Song, Yale
    Morency, Louis-Philippe
    Davis, Randall
    2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [30] A Method to Classify Data by Fuzzy Rule Extraction from Imbalanced Datasets
    Soler, Vicenc
    Cerquides, Jesus
    Sabria, Josep
    Roig, Jordi
    Prim, Marta
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2006, 146 : 55 - +