Extremely Fast Decision Tree Mining for Evolving Data Streams

被引:45
|
作者
Bifet, Albert [1 ]
Zhang, Jiajin [2 ]
Fan, Wei [3 ]
He, Cheng [2 ]
Zhang, Jianfeng [2 ]
Qian, Jianfeng [4 ]
Holmes, Geoff [5 ]
Pfahringer, Bernhard [5 ]
机构
[1] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France
[2] HUAWEI Noahs Ark Lab, Hong Kong, Peoples R China
[3] Baidu Res Big Data Lab, Sunnyvale, CA USA
[4] Columbia Univ, New York, NY USA
[5] Univ Waikato, Hamilton, New Zealand
关键词
Data Streams; Online Learning; Decision Trees; Classification;
D O I
10.1145/3097983.3098139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays real-time industrial applications are generating a huge amount of data continuously every day. To process these large data streams, we need fast and efficient methodologies and systems. A useful feature desired for data scientists and analysts is to have easy to visualize and understand machine learning models. Decision trees are preferred in many real-time applications for this reason, and also, because combined in an ensemble, they are one of the most powerful methods in machine learning. In this paper, we present a new system called STREAMDM-C++, that implements decision trees for data streams in C++, and that has been used extensively at Huawei. Streaming decision trees adapt to changes on streams, a huge advantage since standard decision trees are built using a snapshot of data, and can not evolve over time. STREAMDM-C++ is easy to extend, and contains more powerful ensemble methods, and a more efficient and easy to use adaptive decision trees. We compare our new implementation with VFML, the current state of the art implementation in C, and show how our new system outperforms VFML in speed using less resources.
引用
收藏
页码:1733 / 1742
页数:10
相关论文
共 50 条
  • [1] Fast Perceptron Decision Tree Learning from Evolving Data Streams
    Bifet, Albert
    Holmes, Geoff
    Pfahringer, Bernhard
    Frank, Eibe
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 299 - 310
  • [2] The CART decision tree for mining data streams
    Rutkowski, Leszek
    Jaworski, Maciej
    Pietruczuk, Lena
    Duda, Piotr
    [J]. INFORMATION SCIENCES, 2014, 266 : 1 - 15
  • [3] Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams
    Marron, Diego
    Bifet, Albert
    Morales, Gianmarco De Francisci
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 615 - +
  • [4] Extremely Fast Decision Tree
    Manapragada, Chaitanya
    Webb, Geoffrey I.
    Salehi, Mahsa
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1953 - 1962
  • [5] Fast Mining and Forecasting of Co-evolving Epidemiological Data Streams
    Kimura, Tasuku
    Matsubara, Yasuko
    Kawabata, Koki
    Sakurai, Yasushi
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3157 - 3167
  • [6] Possibilistic Very Fast Decision Tree for Uncertain Data Streams
    Hamroun, Mohamed
    Gouider, Mohamed Salah
    [J]. INTELLIGENT DECISION TECHNOLOGIES, 2015, 39 : 195 - 207
  • [7] An incremental fuzzy decision tree classification method for mining data streams
    Wang, Tao
    Li, Zhoujun
    Yan, Yuejin
    Chen, Huowang
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 91 - +
  • [8] Enhancement of Very Fast Decision Tree for Data Stream Mining
    Lefa, Mai
    Abd-Elkader, Hatem
    Salem, Rashed
    [J]. STUDIES IN INFORMATICS AND CONTROL, 2022, 31 (02): : 49 - 60
  • [9] Mining Evolving Data Streams with Particle Filters
    Fok, Ricky
    An, Aijun
    Wang, Xiaogang
    [J]. COMPUTATIONAL INTELLIGENCE, 2017, 33 (02) : 147 - 180
  • [10] Mining evolving data streams for frequent patterns
    Laur, Pierre-Alain
    Nock, Richard
    Symphor, Jean-Emile
    Poncelet, Pascal
    [J]. PATTERN RECOGNITION, 2007, 40 (02) : 492 - 503