STDS: self-training data streams for mining limited labeled data in non-stationary environment

被引:0
|
作者
Shirin Khezri
Jafar Tanha
Ali Ahmadi
Arash Sharifi
机构
[1] Islamic Azad University,Department of Computer Engineering, Science and Research Branch
[2] University of Tabriz,Electrical and computer Engineering Department
[3] School of Computer Science,Faculty of Computer Engineering
[4] Institute for Research in Fundamental Sciences (IPM),undefined
[5] K.N.Toosi University of Technology,undefined
来源
Applied Intelligence | 2020年 / 50卷
关键词
Semi-supervised learning; Self-training; Data streams; Concept drift; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
引用
收藏
页码:1448 / 1467
页数:19
相关论文
共 50 条
  • [1] STDS: self-training data streams for mining limited labeled data in non-stationary environment
    Khezri, Shirin
    Tanha, Jafar
    Ahmadi, Ali
    Sharifi, Arash
    [J]. APPLIED INTELLIGENCE, 2020, 50 (05) : 1448 - 1467
  • [2] Online Oversampling for Sparsely Labeled Imbalanced and Non-Stationary Data Streams
    Korycki, Lukasz
    Krawczyk, Bartosz
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] An online adaptive classifier ensemble for mining non-stationary data streams
    Verdecia-Cabrera, Alberto
    Blanco, Isvani Frias
    Carvalho, Andre C. P. L. F.
    [J]. INTELLIGENT DATA ANALYSIS, 2018, 22 (04) : 787 - 806
  • [4] Real-time data mining of non-stationary data streams from sensor networks
    Cohen, Lior
    Avrahami-Bakish, Gil
    Last, Mark
    Kandel, Abraham
    Kipersztok, Oscar
    [J]. INFORMATION FUSION, 2008, 9 (03) : 344 - 353
  • [5] Rank Aggregation for Non-stationary Data Streams
    Irurozki, Ekhine
    Perez, Aritz
    Lobo, Jesus
    Del Ser, Javier
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 297 - 313
  • [6] Outlier Detection in Non-stationary Data Streams
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    [J]. SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2019), 2019, : 25 - 36
  • [7] The ubiquitous self-organizing map for non-stationary data streams
    Silva B.
    Marques N.C.
    [J]. Journal of Big Data, 2 (1)
  • [8] Mining Data Streams with Labeled and Unlabeled Training Examples
    Zhang, Peng
    Zhu, Xingquan
    Guo, Li
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 627 - +
  • [9] Learning with ensembles from non-stationary data streams
    Verdecia-Cabrera, Alberto
    Frias-Blanco, Isvani
    Quintero-Dominguez, Luis
    Sarabia, Yanet Rodriguez
    [J]. INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2018, 21 (62): : 145 - 158
  • [10] Scarcity of Labels in Non-Stationary Data Streams: A Survey
    Fahy, Conor
    Yang, Shengxiang
    Gongora, Mario
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (02)