Decision Trees for Mining Data Streams Based on the Gaussian Approximation

被引:117
|
作者
Rutkowski, Leszek [1 ,2 ]
Jaworski, Maciej [1 ]
Pietruczuk, Lena [1 ]
Duda, Piotr [1 ]
机构
[1] Czestochowa Tech Univ, Inst Computat Intelligence, PL-42200 Czestochowa, Poland
[2] Acad Management, Inst Informat Technol, PL-90113 Lodz, Poland
关键词
Data steam; decision trees; information gain; Gaussian approximation; PATTERN-CLASSIFICATION;
D O I
10.1109/TKDE.2013.34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the Hoeffding tree algorithm was proposed in the literature, decision trees became one of the most popular tools for mining data streams. The key point of constructing the decision tree is to determine the best attribute to split the considered node. Several methods to solve this problem were presented so far. However, they are either wrongly mathematically justified (e. g., in the Hoeffding tree algorithm) or time-consuming (e.g., in the McDiarmid tree algorithm). In this paper, we propose a new method which significantly outperforms the McDiarmid tree algorithm and has a solid mathematical basis. Our method ensures, with a high probability set by the user, that the best attribute chosen in the considered node using a finite data sample is the same as it would be in the case of the whole data stream.
引用
收藏
页码:108 / 119
页数:12
相关论文
共 50 条
  • [1] Decision trees for mining data streams
    Gama, Joao
    Fernandes, Ricardo
    Rocha, Ricardo
    [J]. INTELLIGENT DATA ANALYSIS, 2006, 10 (01) : 23 - 45
  • [2] Decision Trees for Mining Data Streams Based on the McDiarmid's Bound
    Rutkowski, Leszek
    Pietruczuk, Lena
    Duda, Piotr
    Jaworski, Maciej
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) : 1272 - 1279
  • [3] Mining decision trees from data streams in a mobile environment
    Kargupta, H
    Park, BH
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 281 - 288
  • [4] Constructing Decision Trees for Mining High-speed Data Streams
    Xu Wenhua
    Qin Zheng
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (02) : 215 - 220
  • [5] Ambiguous decision trees for mining concept-drifting data streams
    Liu, Jing
    Li, Xue
    Zhong, Weicai
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (15) : 1347 - 1355
  • [6] Mining Uncertain Data Streams Using Clustering Feature Decision Trees
    Xu, Wenhua
    Qin, Zheng
    Hu, Hao
    Zhao, Nan
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PT II, 2011, 7121 : 195 - +
  • [7] Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams
    Hashemi, Sattar
    Kangavari, Mohammadreza
    Yang, Ying
    [J]. FUNDAMENTA INFORMATICAE, 2008, 88 (1-2) : 135 - 160
  • [8] A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments
    Kargupta, H
    Park, BH
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (02) : 216 - 229
  • [9] Data mining with decision trees and decision rules
    Apte, C
    Weiss, S
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1997, 13 (2-3): : 197 - 210
  • [10] Regularized and incremental decision trees for data streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    [J]. ANNALS OF TELECOMMUNICATIONS, 2020, 75 (9-10) : 493 - 503