Learning Regularized Hoeffding Trees from Data Streams

被引:9
|
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana PUC PR, PPGIA, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Decision Tree; Concept Drift; Regularization; CONCEPT DRIFT;
D O I
10.1145/3297280.3297334
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [31] Random Ensemble Decision Trees for Learning Concept-Drifting Data Streams
    Li, Peipei
    Wu, Xindong
    Liang, Qianhui
    Hu, Xuegang
    Zhang, Yuhong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 313 - 325
  • [32] Learning concept-drifting data streams with random ensemble decision trees
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    Wang, Hao
    NEUROCOMPUTING, 2015, 166 : 68 - 83
  • [33] Decision trees for mining data streams
    Gama, Joao
    Fernandes, Ricardo
    Rocha, Ricardo
    INTELLIGENT DATA ANALYSIS, 2006, 10 (01) : 23 - 45
  • [34] CLASSIFYING GENE DATA WITH REGULARIZED ENSEMBLE TREES
    Thanh-Tung Nguyen
    Huong Nguyen
    Wu, Yinxu
    Li, Mark Junjie
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 134 - 139
  • [35] Handling numeric attributes in Hoeffding trees
    Pfahringer, Bernhard
    Holmes, Geoffrey
    Kirkby, Richard
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 296 - 307
  • [36] A data streams analysis strategy based on hoeffding tree with concept drift on Hadoop system
    Song, Xin
    He, Huiyuan
    Niu, Shaokai
    Gao, Jing
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 45 - 48
  • [37] Adaptive Learning from Evolving Data Streams
    Bifet, Albert
    Gavalda, Ricard
    ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 249 - 260
  • [38] Scalable Preference Learning from Data Streams
    Dzogang, Fabon
    Lansdall-Welfare, Thomas
    Sudhahar, Saatviga
    Cristianini, Nello
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 885 - 890
  • [39] An overview on learning from data streams - Preface
    Gama, Joao
    Rodrigues, Pedro
    Aguilar-Ruiz, Jesus
    NEW GENERATION COMPUTING, 2007, 25 (01) : 1 - 4
  • [40] Learning from data streams and class imbalance
    Wang, Shuo
    Minku, Leandro L.
    Chawla, Nitesh
    Yao, Xin
    CONNECTION SCIENCE, 2019, 31 (02) : 103 - 104