Online semi-supervised active learning ensemble classification for evolving imbalanced data streams

被引:0
|
作者
Guo, Yinan [1 ,4 ]
Pu, Jiayang [1 ,3 ]
Jiao, Botao [2 ]
Peng, Yanyan [1 ]
Wang, Dini [1 ]
Yang, Shengxiang [1 ,5 ]
机构
[1] China Univ Min & Technol Beijing, Beijing 100083, Peoples R China
[2] China Univ Min & Technol, Xuzhou 221116, Peoples R China
[3] China Univ Min & Technol Beijing, Inner Mongolia Res Inst, Ordos 017010, Peoples R China
[4] Minist Educ China, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[5] De Montfort Univ, Leicester LE1 9BH, England
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Semi-supervised; Active learning; Concept drift; Imbalance; Data stream; DYNAMIC WEIGHTED MAJORITY; FAULT-DIAGNOSIS; MACHINERY;
D O I
10.1016/j.asoc.2024.111452
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Concept drift is a core challenge in classification tasks of data streams. Although many drift adaptation methods have been presented, most of them assume that labels of all data are available, which is impractical in many real -world applications. Additionally, the absence of label makes the imbalance ratio of an imbalanced data stream difficultly being obtained in time, providing the inaccurate guidance for resampling and causing poor generalization. To tackle the joint challenges, an online semi -supervised active learning method is proposed to classifier imbalanced data streams with concept drift. A newly -arrived data is first added to the sliding window, and then assigned a pseudo label in terms of its nearest cluster. Meanwhile, semi -supervised clustering algorithm offers its predicted label. Based on the above two predictive labels, cluster -based query strategy provides the criteria for the evaluation and selection of representative instances. More especially, the uncertainty and importance of instances are defined to synthetically evaluate its representativeness. After obtaining true labels of typical ones, ensemble classifier is updated by all instances in current sliding window. Experimental results on 13 synthetic and real data streams indicate that the proposed method outperforms six comparative methods on both G -mean and Recall under various labeling budgets.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Imbalanced fault diagnosis based on semi-supervised ensemble learning
    Jian, Chuanxia
    Ao, Yinhui
    [J]. JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (07) : 3143 - 3158
  • [22] Robust semi-supervised classification for imbalanced and incomplete data
    Chen, Mengxing
    Dou, Jun
    Fan, Yali
    Song, Yan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (02) : 2781 - 2797
  • [23] Multi-class imbalanced semi-supervised learning from streams through online ensembles
    Vafaie, Parsa
    Viktor, Herna
    Michalowski, Wojtek
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 867 - 874
  • [24] Semi-supervised learning for medical image classification using imbalanced training data
    Huynh, Tri
    Nibali, Aiden
    He, Zhen
    [J]. Computer Methods and Programs in Biomedicine, 2022, 216
  • [25] Semi-supervised learning for medical image classification using imbalanced training data
    Huynh, Tri
    Nibali, Aiden
    He, Zhen
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 216
  • [26] Semi-Supervised Online Elastic Extreme Learning Machine for Data Classification
    da Silva, Carlos A. S.
    Krohling, Renato A.
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [27] Semi-supervised Learning for Imbalanced Classification of Credit Card Transaction
    Salazar, Addisson
    Safont, Gonzalo
    Vergara, Luis
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [28] News Classification with Semi-Supervised and Active Learning
    Guo, Chen
    Chao, Ye
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (04) : 28 - 38
  • [29] GAN-Based Semi-supervised For Imbalanced Data Classification
    Zhou, Tingting
    Liu, Wei
    Zhou, Congyu
    Chen, Leiting
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT (ICIM2018), 2018, : 17 - 21
  • [30] Semi-supervised Classification Based Mixed Sampling for Imbalanced Data
    Zhao, Jianhua
    Liu, Ning
    [J]. OPEN PHYSICS, 2019, 17 (01): : 975 - 983