Handling numeric attributes in Hoeffding trees

被引:0
|
作者
Pfahringer, Bernhard [1 ]
Holmes, Geoffrey [1 ]
Kirkby, Richard [1 ]
机构
[1] Univ Waikato, Hamilton, New Zealand
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into pre-defined bins or sort the data and search for the best split points. Unfortunately, none of these solutions carry over particularly well to a data stream environment. Solutions for data streams have been proposed by several authors but as yet none have been compared empirically. In this paper we investigate a range of methods for multi-class tree-based classification where the handling of numeric attributes takes place as the tree is constructed. To this end, we extend an existing approximation approach, based on simple Gaussian approximation. We then compare this method with four approaches from the literature arriving at eight final algorithm configurations for testing. The solutions cover a range of options from perfectly accurate and memory intensive to highly approximate. All methods are tested using the Hoeffding tree classification algorithm. Surprisingly, the experimental comparison shows that the most approximate methods produce the most accurate trees by allowing for faster tree growth.
引用
收藏
页码:296 / 307
页数:12
相关论文
共 50 条
  • [31] Mining optimized association rules with categorical and numeric attributes
    Rastogi, R
    Shim, K
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 503 - 512
  • [32] For real: a thorough look at numeric attributes in subgroup discovery
    Marvin Meeng
    Arno Knobbe
    Data Mining and Knowledge Discovery, 2021, 35 : 158 - 212
  • [33] For real: a thorough look at numeric attributes in subgroup discovery
    Meeng, Marvin
    Knobbe, Arno
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (01) : 158 - 212
  • [34] Mining optimized association rules with categorical and numeric attributes
    Rastogi, R
    Shim, K
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) : 29 - 50
  • [35] Handling floating-point exceptions in numeric programs
    Hauser, JR
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1996, 18 (02): : 139 - 174
  • [36] A Novel Application of Hoeffding's Inequality to Decision Trees Construction for Data Streams
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    Rutkowski, Leszek
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3324 - 3330
  • [37] Accurate Ensembles for Data Streams: Combing Restricted Hoeffding Trees using Stacking
    Bifet, Albert
    Frank, Eibe
    Holmes, Geoffrey
    Pfahringer, Bernhard
    PROCEEDINGS OF 2ND ASIAN CONFERENCE ON MACHINE LEARNING (ACML2010), 2010, 13 : 225 - 240
  • [38] Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes
    Ahmad, A
    Dey, L
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 561 - 572
  • [39] Clustering association rules with multi-dimensional numeric attributes
    Deng, H.
    Liu, H.
    Lu, S.
    Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 2001, 29 (03): : 33 - 35
  • [40] Learning Embeddings from Knowledge Graphs With Numeric Edge Attributes
    Pai, Sumit
    Costabello, Luca
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2869 - 2875