Clustering-based incremental learning for imbalanced data classification

被引:0
|
作者
Liu, Yuxin [1 ,2 ]
Du, Guangyu [2 ]
Yin, Chenke [1 ]
Zhang, Haichao [1 ]
Wang, Jia [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Peoples R China
[2] Hong Kong Polytech Univ, Dept Appl Phys, Hong Kong 999077, Peoples R China
关键词
Imbalance data; Classification; Clustering; Incremental learning; DIRL; SMOTE;
D O I
10.1016/j.knosys.2024.111612
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL , an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a "two-step clustering"method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Imbalanced Data Classification Algorithm Based on Clustering and SVM
    Huang, Bo
    Zhu, Yimin
    Wang, Zhongzhen
    Fang, Zhijun
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (02)
  • [22] EFFICIENT TRAINING DATA GENERATION BY CLUSTERING-BASED CLASSIFICATION
    Boege, Melanie
    Bulatov, Dimitri
    Debroize, Denis
    Haeufel, Gisela
    Lucks, Lukas
    [J]. XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION III, 2022, 5-3 : 179 - 186
  • [23] Spectral Clustering-based Classification
    Owhadi-Kareshk, Moein
    Akbarzadeh-T, Mohammad-R
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2015, : 222 - 227
  • [24] Clustering-Based Federated Learning for Heterogeneous IoT Data
    Li, Shumin
    Wei, Linna
    Zhang, Weidong
    Wu, Xuangou
    [J]. 2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 172 - 179
  • [25] ClusterCNN: Clustering-Based Feature Learning for Hyperspectral Image Classification
    Yao, Wei
    Lian, Cheng
    Bruzzone, Lorenzo
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 1991 - 1995
  • [26] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Tao, Yanyun
    Zhang, Yuzhen
    Jiang, Bin
    [J]. BMC MEDICAL GENOMICS, 2020, 13 (Suppl 10)
  • [27] An incremental updating method for clustering-based high-dimensional data indexing
    Wang, B
    Gan, JQ
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 495 - 502
  • [28] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Yanyun Tao
    Yuzhen Zhang
    Bin Jiang
    [J]. BMC Medical Genomics, 13
  • [29] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [30] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    [J]. ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155