Active Learning From Stream Data Using Optimal Weight Classifier Ensemble

被引:85
|
作者
Zhu, Xingquan [1 ,2 ]
Zhang, Peng [3 ]
Lin, Xiaodong [4 ]
Shi, Yong [5 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
[2] Univ Technol Sydney, Fac Engn & Informat Technol, QCIS Ctr, Sydney, NSW 2007, Australia
[3] Chinese Acad Sci, Inst Comp Technol, Beijing 100090, Peoples R China
[4] Rutgers State Univ, Rutgers Business Sch, Dept Management Sci & Informat Syst, Newark, NJ 07102 USA
[5] Univ Nebraska, Coll Informat Sci & Technol, Omaha, NE 68118 USA
基金
美国国家科学基金会; 澳大利亚研究理事会;
关键词
Active learning; classifier ensemble; stream data; ALGORITHM; SYSTEMS; MODELS; NOISE;
D O I
10.1109/TSMCB.2010.2042445
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new research problem on active learning from data streams, where data volumes grow continuously, and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. To tackle the technical challenges raised by the dynamic nature of the stream data, i.e., increasing data volumes and evolving decision concepts, we propose a classifier-ensemble-based active learning framework that selectively labels instances from data streams to build a classifier ensemble. We argue that a classifier ensemble's variance directly corresponds to its error rate, and reducing a classifier ensemble's variance is equivalent to improving its prediction accuracy. Because of this, one should label instances toward the minimization of the variance of the underlying classifier ensemble. Accordingly, we introduce a minimum-variance (MV) principle to guide the instance labeling process for data streams. In addition, we derive an optimal-weight calculation method to determine the weight values for the classifier ensemble. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Experimental results on synthetic and real-world data demonstrate the performance of the proposed work in comparison with other approaches.
引用
收藏
页码:1607 / 1621
页数:15
相关论文
共 50 条
  • [31] A Collaborative Intrusion Detection Model using a novel optimal weight strategy based on Genetic Algorithm for Ensemble Classifier
    Teng, Shaohua
    Zhang, Zhenhua
    Teng, Luyao
    Zhang, Wei
    Zhu, Haibin
    Fang, Xiaozhao
    Fei, Lunke
    [J]. PROCEEDINGS OF THE 2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD)), 2018, : 761 - 766
  • [32] Active learning using rough fuzzy classifier for cancer prediction from microarray gene expression data
    Halder, Anindya
    Kumar, Ansuman
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 92
  • [33] Deterministic Concept Drift Detection in Ensemble Classifier Based Data Stream Classification Process
    Abdualrhman, Mohammed Ahmed Ali
    Padma, M. C.
    [J]. INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2019, 11 (01) : 29 - 48
  • [34] Employing One-Class SVM Classifier Ensemble for Imbalanced Data Stream Classification
    Klikowski, Jakub
    Wozniak, Michal
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 117 - 127
  • [35] A Novel Simulated Annealing Based Training Algorithm for Data Stream Processing Ensemble Classifier
    Jackowski, Konrad
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2017, 2018, 578 : 443 - 452
  • [36] Human activity learning for assistive robotics using a classifier ensemble
    Adama, David Ada
    Lotfi, Ahmad
    Langensiepen, Caroline
    Lee, Kevin
    Trindade, Pedro
    [J]. SOFT COMPUTING, 2018, 22 (21) : 7027 - 7039
  • [37] Feature Selection and Ensemble Meta Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Alias, Suraya
    Lammasha, Mohamed A. M.
    [J]. PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2018, 2018, : 134 - 139
  • [38] Human activity learning for assistive robotics using a classifier ensemble
    David Ada Adama
    Ahmad Lotfi
    Caroline Langensiepen
    Kevin Lee
    Pedro Trindade
    [J]. Soft Computing, 2018, 22 : 7027 - 7039
  • [39] Ensemble classifier based big data classification with hybrid optimal feature selection
    Pamila, J. C. Miraclin Joyce
    Selvi, R. Senthamil
    Santhi, P.
    Nithya, T. M.
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
  • [40] Online Active Learning with Drifted Data Streams Using Paired Ensemble Framework
    Shan, Ji-Cheng
    Liu, Wei-Ke
    Chu, Chen-Xi
    Dai, Chao-Fan
    Liu, Qing-Bao
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12