A clustering-based adaptive undersampling ensemble method for highly unbalanced data classification

被引:1
|
作者
Yuan, Xiaohan [1 ]
Sun, Chuan [2 ]
Chen, Shuyu [3 ]
机构
[1] Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[3] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 400000, Peoples R China
关键词
Highly unbalanced data; Cluster; Undersampling; Ensemble learning; CLASS-IMBALANCED DATASETS; PREDICTION; MODEL; SVM;
D O I
10.1016/j.asoc.2024.111659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance issue is prevalent in various practical classification tasks. A high unbalanced rate will significantly decrease the classification performance of unbalanced learning. However, existing methods for highly unbalanced data classification still face two key difficulties: (1) fairly learning key information, and (2) maintaining consistency. To address these difficulties, we propose a novel majority clustering -based adaptive undersampling enhanced ensemble classification method, which integrates undersampling and ensemble techniques. In the adaptive undersampling process, we first consider the spatial distribution of majority samples to ensure distribution consistency. We then consider an adaptive sampling rate and introduce a feedback mechanism to obtain more representative majority samples from each cluster. In the classifier ensemble process, multiple ensemble iterations are introduced to achieve fair attention to key information in different classes. Finally, six kinds of experiments are conducted on 17 real highly unbalanced datasets from multiple fields. Experimental results demonstrate that the proposed method outperforms existing methods in terms of effectiveness, robustness, and adaptability.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    [J]. INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [2] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [3] Unbalanced data sentiment classification method based on ensemble learning
    Duan, Jidong
    Ma, Kun
    Sun, Runyuan
    [J]. PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019), 2019, : 34 - 38
  • [4] Anomaly Detection Method Based on Clustering Undersampling and Ensemble Learning
    Huan, Wenming
    Lin, Haitao
    Lie, Haixue
    Zhou, Yan
    Wang, Yiming
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 980 - 984
  • [5] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    [J]. APPLIED SOFT COMPUTING, 2021, 101
  • [6] Clustering-based approach for medical data classification
    Kodabagi, Mallikarjun M.
    Tikotikar, Ahelam
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [7] A clustering-based possibilistic method for image classification
    Drummond, I
    Sandri, S
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 454 - 463
  • [8] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    [J]. SCIENTIFIC PROGRAMMING, 2019, 2019
  • [9] Ensemble Classification for Anomalous Propagation Echo Detection with Clustering-Based Subset-Selection Method
    Lee, Hansoo
    Kim, Sungshin
    [J]. ATMOSPHERE, 2017, 8 (01):
  • [10] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652