Learning Regularized Hoeffding Trees from Data Streams

被引：9

作者：

Barddal, Jean Paul ^{[1
]}

Enembreck, Fabricio ^{[1
]}

机构：

[1] Pontificia Univ Catolica Parana PUC PR, PPGIA, Curitiba, Parana, Brazil

来源：

SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING | 2019年

关键词：

Data Stream Mining; Decision Tree; Concept Drift; Regularization; CONCEPT DRIFT;

D O I：

10.1145/3297280.3297334

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Learning from data streams is a hot topic in machine learning that targets the learning and update of predictive models as data becomes available for both training and query. Due to their simplicity and convincing results in a multitude of applications, Hoeffding Trees are, by far, the most widely used family of methods for learning decision trees from streaming data. Despite the aforementioned positive characteristics, Hoeffding Trees tend to continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and multiple times on the same feature; thus leading to unnecessary complexity. With this behavior, Hoeffding Trees lose the ability to be human-understandable and computationally efficient. To tackle these issues, we propose a regularization scheme for Hoeffding Trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far; and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. The proposed scheme is combined with both standard and adaptive variants of Hoeffding Trees. Experiments using real-world, stationary and drifting synthetic data show that the proposed method prevents both original and adaptive Hoeffding Trees from unnecessarily growing while maintaining impressive accuracy rates. As a byproduct of the regularization process, significant improvements in processing time, model complexity, and memory consumption have also been observed, thus showing the effectiveness of the proposed regularization scheme.

引用

页码：574 / 581

页数：8

共 50 条

[41] Online Learning from Trapezoidal Data Streams
Zhang, Qin
Zhang, Peng
Long, Guodong
Ding, Wei
Zhang, Chengqi
Wu, Xindong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (10) : 2709 - 2723
[42] Learning from data streams with only positive and unlabeled data
Xiangju Qin
Yang Zhang
Chen Li
Xue Li
Journal of Intelligent Information Systems, 2013, 40 : 405 - 430
[43] Learning from concept drifting data streams with unlabeled data
Wu, Xindong
Li, Peipei
Hu, Xuegang
NEUROCOMPUTING, 2012, 92 : 145 - 155
[44] Learning from Concept Drifting Data Streams with Unlabeled Data
Li, Peipei
Wu, Xindong
Hu, Xuegang
PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1945 - 1946
[45] Learning from data streams with only positive and unlabeled data
Qin, Xiangju
Zhang, Yang
Li, Chen
Li, Xue
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 405 - 430
[46] Learning from ubiquitous data streams: Clustering data and data sources
Rodrigues, Pedro Pereira
AI COMMUNICATIONS, 2012, 25 (01) : 69 - 71
[47] Decentralized Online Learning in RKHS With Non-Stationary Data Streams: Non-Regularized Algorithm
Zhang, Xiwei
Li, Tao
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 94 - 99
[48] Learning Recursive Probability Trees from Data
Cano, Andres
Gomez-Olmedo, Manuel
Moral, Serafin
Beatriz Perez-Ariza, Cora
Salmeron, Antonio
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 332 - 341
[49] Gradient boosted trees for evolving data streams
Gunasekara, Nuwan
Pfahringer, Bernhard
Gomes, Heitor
Bifet, Albert
MACHINE LEARNING, 2024, 113 (05) : 3325 - 3352
[50] Gradient boosted trees for evolving data streams
Nuwan Gunasekara
Bernhard Pfahringer
Heitor Gomes
Albert Bifet
Machine Learning, 2024, 113 : 3325 - 3352

← 1 2 3 4 5 →