Learning Regularized Hoeffding Trees from Data Streams

被引:9
|
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana PUC PR, PPGIA, Curitiba, Parana, Brazil
关键词
Data Stream Mining; Decision Tree; Concept Drift; Regularization; CONCEPT DRIFT;
D O I
10.1145/3297280.3297334
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [41] Online Learning from Trapezoidal Data Streams
    Zhang, Qin
    Zhang, Peng
    Long, Guodong
    Ding, Wei
    Zhang, Chengqi
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (10) : 2709 - 2723
  • [42] Learning from data streams with only positive and unlabeled data
    Xiangju Qin
    Yang Zhang
    Chen Li
    Xue Li
    Journal of Intelligent Information Systems, 2013, 40 : 405 - 430
  • [43] Learning from concept drifting data streams with unlabeled data
    Wu, Xindong
    Li, Peipei
    Hu, Xuegang
    NEUROCOMPUTING, 2012, 92 : 145 - 155
  • [44] Learning from Concept Drifting Data Streams with Unlabeled Data
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1945 - 1946
  • [45] Learning from data streams with only positive and unlabeled data
    Qin, Xiangju
    Zhang, Yang
    Li, Chen
    Li, Xue
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 405 - 430
  • [46] Learning from ubiquitous data streams: Clustering data and data sources
    Rodrigues, Pedro Pereira
    AI COMMUNICATIONS, 2012, 25 (01) : 69 - 71
  • [47] Decentralized Online Learning in RKHS With Non-Stationary Data Streams: Non-Regularized Algorithm
    Zhang, Xiwei
    Li, Tao
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 94 - 99
  • [48] Learning Recursive Probability Trees from Data
    Cano, Andres
    Gomez-Olmedo, Manuel
    Moral, Serafin
    Beatriz Perez-Ariza, Cora
    Salmeron, Antonio
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 332 - 341
  • [49] Gradient boosted trees for evolving data streams
    Gunasekara, Nuwan
    Pfahringer, Bernhard
    Gomes, Heitor
    Bifet, Albert
    MACHINE LEARNING, 2024, 113 (05) : 3325 - 3352
  • [50] Gradient boosted trees for evolving data streams
    Nuwan Gunasekara
    Bernhard Pfahringer
    Heitor Gomes
    Albert Bifet
    Machine Learning, 2024, 113 : 3325 - 3352