Learning Regularized Hoeffding Trees from Data Streams

被引:9
|
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana PUC PR, PPGIA, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Decision Tree; Concept Drift; Regularization; CONCEPT DRIFT;
D O I
10.1145/3297280.3297334
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [1] Restructuring of Hoeffding Trees for Trapezoidal Data Streams
    Schreckenberger, Christian
    Glockner, Tim
    Stuckenschmidt, Heiner
    Bartelt, Christian
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 416 - 423
  • [2] Regularized and incremental decision trees for data streams
    Jean Paul Barddal
    Fabrício Enembreck
    Annals of Telecommunications, 2020, 75 : 493 - 503
  • [3] Regularized and incremental decision trees for data streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    ANNALS OF TELECOMMUNICATIONS, 2020, 75 (9-10) : 493 - 503
  • [4] Hoeffding adaptive trees for multi-label classification on data streams
    Esteban, Aurora
    Cano, Alberto
    Zafra, Amelia
    Ventura, Sebastian
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [5] Learning Model Trees from Data Streams
    Ikonotnovska, Elena
    Gama, Joao
    DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 52 - +
  • [6] A Novel Application of Hoeffding's Inequality to Decision Trees Construction for Data Streams
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    Rutkowski, Leszek
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3324 - 3330
  • [7] Accurate Ensembles for Data Streams: Combing Restricted Hoeffding Trees using Stacking
    Bifet, Albert
    Frank, Eibe
    Holmes, Geoffrey
    Pfahringer, Bernhard
    PROCEEDINGS OF 2ND ASIAN CONFERENCE ON MACHINE LEARNING (ACML2010), 2010, 13 : 225 - 240
  • [8] Probabilistic Hoeffding Trees Sped-Up Convergence and Adaption of Online Trees on Changing Data Streams
    Boidol, Jonathan
    Hapfelmeier, Andreas
    Tresp, Volker
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS, ICDM 2015, 2015, 9165 : 94 - 108
  • [9] Performance analysis of Hoeffding trees in data streams by using massive online analysis framework
    Srimani, P. K.
    Patil, Malini M.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (04) : 293 - 313
  • [10] Learning model trees from evolving data streams
    Ikonomovska, Elena
    Gama, Joao
    Dzeroski, Saso
    DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 23 (01) : 128 - 168