Learning evolving prototypes for imbalanced data stream classification with limited labels

被引:1
|
作者
Wu, Zhonglin [1 ]
Wang, Hongliang [1 ]
Guo, Jingxia [1 ]
Yang, Qinli [1 ]
Shao, Junming [1 ,2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Data Min Lab, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Huzhou, Peoples R China
[3] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Data streams; Concept drift; Imbalanced learning; Active learning;
D O I
10.1016/j.ins.2024.120979
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world data streams often exhibit long-tailed distributions with heavy class imbalance, posing great challenges for data stream classification, especially in the case of label scarcity and concept drift. Several active learning methods have been proposed to address this problem by selecting the most valuable instances for labeling. However, existing methods often struggle to dynamically identify the most valuable instances that truly represent the current concept while still requiring a large label budget. In this work, we propose a new algorithm, LEPID, to combine dynamic micro -cluster concept modeling and local entropy modeling to select current important concepts and prototypes. Specifically, we give greater weight to concept drift prototypes and minority prototypes to focus more on those regions that represent current concepts. We use a local entropy strategy based on micro-clusters to select the most valuable instances for labeling and reduce the label budget. Extensive experiments on real-world and synthetic imbalanced datasets show that, compared to state-of-the-art algorithms, our method can naturally adapt to concept drift and dynamically capture the current and most valuable prototypes to achieve better results even in the case of label scarcity.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Approximating Learning Curves for Imbalanced Big Data with Limited Labels
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 237 - 242
  • [2] Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels
    Woolam, Clay
    Masud, Mohammad M.
    Khan, Latifur
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 552 - 562
  • [3] RELIABLE SEMI-SUPERVISED LEARNING ON IMBALANCED EVOLVING DATA STREAM
    Pan Liangxu
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [4] Learning High-Dimensional Evolving Data Streams With Limited Labels
    Din, Salah Ud
    Kumar, Jay
    Shao, Junming
    Mawuli, Cobbinah Bernard
    Ndiaye, Waldiodio David
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (11) : 11373 - 11384
  • [5] Dynamic Ensemble Selection for Imbalanced Data Stream Classification with Limited Label Access
    Zyblewski, Pawel
    Wozniak, Michal
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT II, 2021, 12855 : 217 - 226
  • [6] Dual weighted extreme learning machine for imbalanced data stream classification
    Zhang, Yong
    Liu, Wenzhe
    Ren, Xuezhen
    Ren, Yonggong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 33 (02) : 1143 - 1154
  • [7] A Hybrid Learning Framework for Imbalanced Stream Classification
    Zhang, Wenbin
    Wang, Jianwu
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 480 - 487
  • [8] Imbalanced Data Stream Classification: Analysis and Solution
    Anjana, Koringa
    Radhika, Kotecha
    Darshana, Patel
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 316 - 324
  • [9] A reliable adaptive prototype-based learning for evolving data streams with limited labels
    Din, Salah Ud
    Ullah, Aman
    Mawuli, Cobbinah B.
    Yang, Qinli
    Shao, Junming
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [10] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413