Learning evolving prototypes for imbalanced data stream classification with limited labels

被引:1
|
作者
Wu, Zhonglin [1 ]
Wang, Hongliang [1 ]
Guo, Jingxia [1 ]
Yang, Qinli [1 ]
Shao, Junming [1 ,2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Data Min Lab, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Huzhou, Peoples R China
[3] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Data streams; Concept drift; Imbalanced learning; Active learning;
D O I
10.1016/j.ins.2024.120979
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world data streams often exhibit long-tailed distributions with heavy class imbalance, posing great challenges for data stream classification, especially in the case of label scarcity and concept drift. Several active learning methods have been proposed to address this problem by selecting the most valuable instances for labeling. However, existing methods often struggle to dynamically identify the most valuable instances that truly represent the current concept while still requiring a large label budget. In this work, we propose a new algorithm, LEPID, to combine dynamic micro -cluster concept modeling and local entropy modeling to select current important concepts and prototypes. Specifically, we give greater weight to concept drift prototypes and minority prototypes to focus more on those regions that represent current concepts. We use a local entropy strategy based on micro-clusters to select the most valuable instances for labeling and reduce the label budget. Extensive experiments on real-world and synthetic imbalanced datasets show that, compared to state-of-the-art algorithms, our method can naturally adapt to concept drift and dynamically capture the current and most valuable prototypes to achieve better results even in the case of label scarcity.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Adaptive random forests for evolving data stream classification
    Heitor M. Gomes
    Albert Bifet
    Jesse Read
    Jean Paul Barddal
    Fabrício Enembreck
    Bernhard Pfharinger
    Geoff Holmes
    Talel Abdessalem
    Machine Learning, 2017, 106 : 1469 - 1495
  • [32] Active Learning Method for Imbalanced Concept Drift Data Stream
    Li Y.-H.
    Wang T.-T.
    Wang S.-G.
    Li D.-Y.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (03): : 589 - 606
  • [33] Online Multi-threshold Learning with Imbalanced Data Stream
    Cai, Xufen
    Yang, Min
    Zhu, Rong
    Li, Xiaoyan
    Ye, Long
    Zhang, Qin
    ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 3 - 9
  • [34] Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
    Wegier, Weronika
    Ksieniewicz, Pawel
    ENTROPY, 2020, 22 (08)
  • [35] An overview on evolving systems and learning from stream data
    Daniel Leite
    Igor Škrjanc
    Fernando Gomide
    Evolving Systems, 2020, 11 : 181 - 198
  • [36] An overview on evolving systems and learning from stream data
    Leite, Daniel
    Skrjanc, Igor
    Gomide, Fernando
    EVOLVING SYSTEMS, 2020, 11 (02) : 181 - 198
  • [37] Adaptive random tree ensemble for evolving data stream classification
    Paim, Aldo M.
    Enembreck, Fabricio
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [38] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [39] A Contrastive Learning-Based Fault Diagnosis Method for Rotating Machinery With Limited and Imbalanced Labels
    Zhang, Yan
    Liu, Zhuolin
    Huang, Qingqing
    IEEE SENSORS JOURNAL, 2023, 23 (14) : 16402 - 16412
  • [40] Correction to: Adaptive random forests for evolving data stream classification
    Heitor M. Gomes
    Albert Bifet
    Jesse Read
    Jean Paul Barddal
    Fabrício Enembreck
    Bernhard Pfahringer
    Geoff Holmes
    Talel Abdessalem
    Machine Learning, 2019, 108 : 1877 - 1878