A reliable adaptive prototype-based learning for evolving data streams with limited labels

被引:3
|
作者
Din, Salah Ud [1 ,2 ,3 ]
Ullah, Aman [1 ,2 ]
Mawuli, Cobbinah B. [1 ,2 ]
Yang, Qinli [1 ,2 ]
Shao, Junming [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Huzhou, Huzhou 313001, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] COMSATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22020, Pakistan
基金
中国国家自然科学基金;
关键词
Data streams; Data-driven prototypes; Concept drift; Concept evolution; Semi-supervised classification; NONSTATIONARY DATA; CONCEPT DRIFT; CLASSIFICATION; ENSEMBLE; MODEL;
D O I
10.1016/j.ipm.2023.103532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data stream mining presents notable challenges in the form of concept drift and evolution. Existing learning algorithms, typically designed within a supervised learning framework, require class labels for all data points. However, this is an impractical requirement given the rapid pace of data streams, which often results in label scarcity. Recognizing the realistic necessity of learning from data streams with limited labels, we propose an adaptive, data-driven, prototype-based semi-supervised learning framework specifically tailored to handle evolving data streams. Our method employs a prototype-based data representation, summarizing the continuous flow of streaming data using dynamic prototypes at varying levels of granularity. This technique enables improved data abstraction, capturing the underlying local data distributions more accurately. The model also incorporates reliability modeling and efficient emerging class discovery, dynamically updating the significance of prototypes over time and swiftly adapting to local concept drift. We further leverage these adaptive prototypes to intuitively detect concept evolution, i.e., identifying novel classes from a local density perspective. To minimize the need for manual labeling while optimizing performance, we incorporate active learning into our method. This method employs a dual-criteria approach for data point selection, considering both uncertainty and local density. These manually labeled data points, together with unlabeled data, serve to update the model efficiently and robustly. Empirical validation using several bench-mark datasets demonstrates promising performance in comparison to existing state-of-the-art techniques.
引用
下载
收藏
页数:22
相关论文
共 50 条
  • [41] Kalman Filtering for Learning with Evolving Data Streams
    Ziffer, Giacomo
    Bernardo, Alessio
    Della Valle, Emanuele
    Bifet, Albert
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5337 - 5346
  • [42] Incremental Rebalancing Learning on Evolving Data Streams
    Bernardo, Alessio
    Valle, Emanuele Della
    Bifet, Albert
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 844 - 850
  • [43] Adaptive centroid prototype-based domain adaptation for fault diagnosis of rotating machinery without source data
    Li, Qikang
    Tang, Baoping
    Deng, Lei
    Yang, Qichao
    Zhu, Peng
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 251
  • [44] Online Learning for Data Streams With Incomplete Features and Labels
    You, Dianlong
    Yan, Huigui
    Xiao, Jiawei
    Chen, Zhen
    Wu, Di
    Shen, Limin
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4820 - 4834
  • [45] Improving Adaptive Bagging Methods for Evolving Data Streams
    Bifet, Albert
    Holmes, Geoff
    Pfahringer, Bernhard
    Gavalda, Ricard
    ADVANCES IN MACHINE LEARNING, PROCEEDINGS, 2009, 5828 : 23 - +
  • [46] Prototype-based Clustering for Relational Data using Barycentric Coordinates
    Rastin, Parisa
    Matei, Basarab
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 257 - 264
  • [47] Adaptive XML Tree Classification on Evolving Data Streams
    Bifet, Albert
    Gavalda, Ricard
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 147 - 162
  • [48] GPU-based State Adaptive Random Forest for Evolving Data Streams
    Wu, Ocean
    Koh, Yun Sing
    Russello, Giovanni
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [49] EEG-Based Emotion Recognition with Prototype-Based Data Representation
    Wang, Yixin
    Qiu, Shuang
    Zhao, Chen
    Yang, Weijie
    Li, Jinpeng
    Ma, Xuelin
    He, Huiguang
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 684 - 689
  • [50] Learning data streams online - An evolving fuzzy system approach with self-learning/adaptive thresholds
    Ge, Dongjiao
    Zeng, Xiao-Jun
    INFORMATION SCIENCES, 2020, 507 : 172 - 184