A hybrid decision tree training method using data streams

被引:36
|
作者
Wozniak, Michal [1 ]
机构
[1] Wroclaw Univ Technol, Fac Elect, Dept Syst & Comp Networks, PL-50370 Wroclaw, Poland
关键词
Nested generalized exemplar; Nearest hyperrectangle; Concept drift; Decision tree; Incremental learning; Pattern recognition; CLASSIFICATION;
D O I
10.1007/s10115-010-0345-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical classification methods usually assume that pattern recognition models do not depend on the timing of the data. However, this assumption is not valid in cases where new data frequently become available. Such situations are common in practice, for example, spam filtering or fraud detection, where dependencies between feature values and class numbers are continually changing. Unfortunately, most classical machine learning methods (such as decision trees) do not take into consideration the possibility of the model changing, as a result of so-called concept drift and they cannot adapt to a new classification model. This paper focuses on the problem of concept drift, which is a very important issue, especially in data mining methods that use complex structures (such as decision trees) for making decisions. We propose an algorithm that is able to co-train decision trees using a modified NGE (Nested Generalized Exemplar) algorithm. The potential for adaptation of the proposed algorithm and the quality thereof are evaluated through computer experiments, carried out on benchmark datasets from the UCI Machine Learning Repository.
引用
收藏
页码:335 / 347
页数:13
相关论文
共 50 条
  • [1] A hybrid decision tree training method using data streams
    Michal Wozniak
    [J]. Knowledge and Information Systems, 2011, 29 : 335 - 347
  • [2] Optimized hybrid imbalanced data sampling for decision tree training
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Michal
    [J]. PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 339 - 342
  • [3] An incremental fuzzy decision tree classification method for mining data streams
    Wang, Tao
    Li, Zhoujun
    Yan, Yuejin
    Chen, Huowang
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 91 - +
  • [4] The CART decision tree for mining data streams
    Rutkowski, Leszek
    Jaworski, Maciej
    Pietruczuk, Lena
    Duda, Piotr
    [J]. INFORMATION SCIENCES, 2014, 266 : 1 - 15
  • [5] Decision Tree for Dynamic and Unceratin Data streams
    Liang, Chunquan
    Zhang, Yang
    Song, Qun
    [J]. PROCEEDINGS OF 2ND ASIAN CONFERENCE ON MACHINE LEARNING (ACML2010), 2010, 13 : 209 - 224
  • [6] A hybrid decision tree/genetic algorithm method for data mining
    Carvalho, DR
    Freitas, AA
    [J]. INFORMATION SCIENCES, 2004, 163 (1-3) : 13 - 35
  • [7] Novel Class Detection in Concept Drifting Data Streams Using Decision Tree Leaves
    Saha, Deepita
    Haque, Md Mozzammel
    Sarkar, Akash
    Alam, Famina
    Farid, Dewan Md
    Rahman, Chowdhury Mofizur
    Shatabda, Swakkhar
    [J]. 2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 87 - 90
  • [8] Decision tree evolution using limited number of labeled data items from drifting data streams
    Fan, W
    Huang, YA
    Yu, PS
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 379 - 382
  • [9] An Efficient Decision Tree Classification Method Based on Extended Hash Table for Data Streams Mining
    Ouyang, Zhenzheng
    Wu, Quanyuan
    Wang, Tao
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 313 - +
  • [10] Extremely Fast Decision Tree Mining for Evolving Data Streams
    Bifet, Albert
    Zhang, Jiajin
    Fan, Wei
    He, Cheng
    Zhang, Jianfeng
    Qian, Jianfeng
    Holmes, Geoff
    Pfahringer, Bernhard
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1733 - 1742