Active Learning From Stream Data Using Optimal Weight Classifier Ensemble

被引:85
|
作者
Zhu, Xingquan [1 ,2 ]
Zhang, Peng [3 ]
Lin, Xiaodong [4 ]
Shi, Yong [5 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
[2] Univ Technol Sydney, Fac Engn & Informat Technol, QCIS Ctr, Sydney, NSW 2007, Australia
[3] Chinese Acad Sci, Inst Comp Technol, Beijing 100090, Peoples R China
[4] Rutgers State Univ, Rutgers Business Sch, Dept Management Sci & Informat Syst, Newark, NJ 07102 USA
[5] Univ Nebraska, Coll Informat Sci & Technol, Omaha, NE 68118 USA
基金
澳大利亚研究理事会; 美国国家科学基金会;
关键词
Active learning; classifier ensemble; stream data; ALGORITHM; SYSTEMS; MODELS; NOISE;
D O I
10.1109/TSMCB.2010.2042445
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new research problem on active learning from data streams, where data volumes grow continuously, and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. To tackle the technical challenges raised by the dynamic nature of the stream data, i.e., increasing data volumes and evolving decision concepts, we propose a classifier-ensemble-based active learning framework that selectively labels instances from data streams to build a classifier ensemble. We argue that a classifier ensemble's variance directly corresponds to its error rate, and reducing a classifier ensemble's variance is equivalent to improving its prediction accuracy. Because of this, one should label instances toward the minimization of the variance of the underlying classifier ensemble. Accordingly, we introduce a minimum-variance (MV) principle to guide the instance labeling process for data streams. In addition, we derive an optimal-weight calculation method to determine the weight values for the classifier ensemble. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Experimental results on synthetic and real-world data demonstrate the performance of the proposed work in comparison with other approaches.
引用
收藏
页码:1607 / 1621
页数:15
相关论文
共 50 条
  • [1] A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning
    Lv, Yanxia
    Peng, Sancheng
    Yuan, Ying
    Wang, Cong
    Yin, Pengfei
    Liu, Jiemin
    Wang, Cuirong
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2019, 24 (04) : 379 - 388
  • [2] A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning
    Yanxia Lv
    Sancheng Peng
    Ying Yuan
    Cong Wang
    Pengfei Yin
    Jiemin Liu
    Cuirong Wang
    [J]. Tsinghua Science and Technology, 2019, (04) : 379 - 388
  • [3] A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning
    Yanxia Lv
    Sancheng Peng
    Ying Yuan
    Cong Wang
    Pengfei Yin
    Jiemin Liu
    Cuirong Wang
    [J]. Tsinghua Science and Technology., 2019, 24 (04) - 388
  • [4] Autonomic active learning strategy using cluster-based ensemble classifier for concept drifts in imbalanced data stream
    Halder, Bohnishikha
    Hasan, K. M. Azharul
    Amagasa, Toshiyuki
    Ahmed, Md Manjur
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [5] Hybrid Ensemble Classifier for Stream Data
    Gogte, Purva S.
    Theng, Deepti P.
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 463 - 467
  • [6] Learning of classifier ensemble using virtual data
    Jang, M
    Cho, S
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 955 - 959
  • [7] Classifier Ensemble for Uncertain Data Stream Classification
    Pan, Shirui
    Wu, Kuan
    Zhang, Yang
    Li, Xue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I, PROCEEDINGS, 2010, 6118 : 488 - +
  • [8] Adaptive Ensemble Active Learning for Drifting Data Stream Mining
    Krawczyk, Bartosz
    Cano, Alberto
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2763 - 2771
  • [9] A New Semi-supervised Learning Based Ensemble Classifier for Recurring Data Stream
    Zhang, Bo
    Chen, Dingfang
    Zu, Qiaohong
    Mao, Yichao
    Pan, Yi
    Zhang, Xiaomin
    [J]. PERVASIVE COMPUTING AND THE NETWORKED WORLD, 2014, 8351 : 759 - +
  • [10] A clustering and ensemble based classifier for data stream classification
    Wankhade, Kapil K.
    Jondhale, Kalpana C.
    Dongre, Snehlata S.
    [J]. APPLIED SOFT COMPUTING, 2021, 102