Learning evolving prototypes for imbalanced data stream classification with limited labels

被引:1
|
作者
Wu, Zhonglin [1 ]
Wang, Hongliang [1 ]
Guo, Jingxia [1 ]
Yang, Qinli [1 ]
Shao, Junming [1 ,2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Data Min Lab, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Huzhou, Peoples R China
[3] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Data streams; Concept drift; Imbalanced learning; Active learning;
D O I
10.1016/j.ins.2024.120979
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world data streams often exhibit long-tailed distributions with heavy class imbalance, posing great challenges for data stream classification, especially in the case of label scarcity and concept drift. Several active learning methods have been proposed to address this problem by selecting the most valuable instances for labeling. However, existing methods often struggle to dynamically identify the most valuable instances that truly represent the current concept while still requiring a large label budget. In this work, we propose a new algorithm, LEPID, to combine dynamic micro -cluster concept modeling and local entropy modeling to select current important concepts and prototypes. Specifically, we give greater weight to concept drift prototypes and minority prototypes to focus more on those regions that represent current concepts. We use a local entropy strategy based on micro-clusters to select the most valuable instances for labeling and reduce the label budget. Extensive experiments on real-world and synthetic imbalanced datasets show that, compared to state-of-the-art algorithms, our method can naturally adapt to concept drift and dynamically capture the current and most valuable prototypes to achieve better results even in the case of label scarcity.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Multiset Feature Learning for Highly Imbalanced Data Classification
    Jing, Xiao-Yuan
    Zhang, Xinyu
    Zhu, Xiaoke
    Wu, Fei
    You, Xinge
    Gao, Yang
    Shan, Shiguang
    Yang, Jing-Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 139 - 156
  • [42] Multiset Feature Learning for Highly Imbalanced Data Classification
    Wu, Fei
    Jing, Xiao-Yuan
    Shan, Shiguang
    Zuo, Wangmeng
    Yang, Jing-Yu
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1583 - 1589
  • [43] Transfer estimation of evolving class priors in data stream classification
    Zhang, Zhihao
    Zhou, Jie
    PATTERN RECOGNITION, 2010, 43 (09) : 3151 - 3161
  • [44] An Improved Extreme Learning Machine for Imbalanced Data Classification
    Zhang, Xiaopeng
    Qin, Liangxi
    IEEE ACCESS, 2022, 10 : 8634 - 8642
  • [45] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [46] Representation Learning From Limited Educational Data With Crowdsourced Labels
    Wang, Wentao
    Xu, Guowei
    Ding, Wenbiao
    Huang, Gale Yan
    Li, Guoliang
    Tang, Jiliang
    Liu, Zitao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2886 - 2898
  • [47] Limited Labels for Unlimited Data: Active Learning for Speaker Recognition
    Shum, Stephen H.
    Dehak, Najim
    Glass, James R.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 383 - 387
  • [48] Data-Centric Methods for Environmental Sound Classification With Limited Labels
    Syed, Ali Raza
    Coban, Enis Berk
    Pir, Dara
    Mandel, Michael
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4288 - 4297
  • [49] Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification
    Rosu, Radu-Andrei
    Breaban, Mihaela-Elena
    Luchian, Henri
    2022 24TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC, 2022, : 173 - 180
  • [50] Deterministic Sampling Classifier with weighted Bagging for drifted imbalanced data stream classification
    Klikowski, Jakub
    Wozniak, Michal
    APPLIED SOFT COMPUTING, 2022, 122