Learning Regularized Hoeffding Trees from Data Streams

被引:9
|
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana PUC PR, PPGIA, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Decision Tree; Concept Drift; Regularization; CONCEPT DRIFT;
D O I
10.1145/3297280.3297334
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [21] Hoeffding Trees with nmin adaptation
    Garcia-Martin, Eva
    Lavesson, Niklas
    Grahn, Hakan
    Casalicchio, Emiliano
    Boeva, Veselka
    2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 70 - 79
  • [22] Ensembles of Restricted Hoeffding Trees
    Bifet, Albert
    Frank, Eibe
    Holmes, Geoff
    Pfahringer, Bernhard
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [23] New options for Hoeffding trees
    Pfahringer, Bernhard
    Holmes, Geoffrey
    Kirkby, Richard
    AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 90 - 99
  • [24] Regression Trees from Data Streams with Drift Detection
    Ikonomovska, Elena
    Gama, Joao
    Sebastiao, Raquel
    Gjorgjevik, Dejan
    DISCOVERY SCIENCE, PROCEEDINGS, 2009, 5808 : 121 - +
  • [25] An Overview on Learning from Data Streams
    João Gama
    Pedro Rodrigues
    Jesús Aguilar-Ruiz
    New Generation Computing, 2006, 25 (1) : 1 - 4
  • [26] Active learning from data streams
    Zhu, Xingquan
    Zhang, Peng
    Lin, Xiaodong
    Shi, Yong
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 757 - +
  • [27] HTME: A Data Streams Processing Strategy based on Hoeffding Tree in MapReduce Environment
    Song, Xin
    Gao, Jing
    Ma, Jin'an
    Niu, Shaokai
    He, Huiyuan
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1642 - 1645
  • [28] Learning from data stream based on Random Projection and Hoeffding Tree classifier
    Xuan Cuong Pham
    Manh Truong Dang
    Sang Viet Dinh
    Son Hoang
    Tien Thanh Nguyen
    Liew, Alan Wee-Chung
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 710 - 717
  • [29] Mining decision trees from data streams in a mobile environment
    Kargupta, H
    Park, BH
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 281 - 288
  • [30] Stress-testing Hoeffding trees
    Holmes, G
    Kirkby, R
    Pfahringer, B
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 495 - 502