Clustering-based incremental learning for imbalanced data classification

被引:0
|
作者
Liu, Yuxin [1 ,2 ]
Du, Guangyu [2 ]
Yin, Chenke [1 ]
Zhang, Haichao [1 ]
Wang, Jia [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Peoples R China
[2] Hong Kong Polytech Univ, Dept Appl Phys, Hong Kong 999077, Peoples R China
关键词
Imbalance data; Classification; Clustering; Incremental learning; DIRL; SMOTE;
D O I
10.1016/j.knosys.2024.111612
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL , an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a "two-step clustering"method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Hachao
    Wang, Jia
    [J]. Knowledge-Based Systems, 2024, 292
  • [2] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [3] Clustering-based Binary-class Classification for Imbalanced Data Sets
    Chen, Chao
    Shyu, Mei-Ling
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 384 - 389
  • [4] An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data
    Kwak, Jueun
    Lee, Taehyung
    Kim, Chang Ouk
    [J]. IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2015, 28 (03) : 318 - 328
  • [5] Adaptive Clustering-Based Model Aggregation for Federated Learning with Imbalanced Data
    Wang, Dong
    Zhang, Naifu
    Tao, Meixia
    [J]. SPAWC 2021: 2021 IEEE 22ND INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC 2021), 2020, : 591 - 595
  • [6] Clustering-based Domain-Incremental Learning
    Lamers, Christiaan
    Vidal, Rene
    Belbachir, Nabil
    Van Stein, Niki
    Back, Thomas
    Giampouras, Paris
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3376 - 3384
  • [7] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    [J]. SCIENTIFIC PROGRAMMING, 2019, 2019
  • [8] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    [J]. INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [9] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [10] Clustering-based Active Learning Classification towards Data Stream
    Yin, Chunyong
    Chen, Shuangshuang
    Yin, Zhichao
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)