Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams

被引:0
|
作者
Marron, Diego [1 ,2 ]
Ayguade, Eduard [1 ,2 ]
Herrero, Jose R. [2 ]
Read, Jesse [3 ]
Bifet, Albert [4 ]
机构
[1] Barcelona Supercomp Ctr, Comp Sci Dept, Barcelona, Spain
[2] Univ Politecn Cataluna, Comp Architecture Dept, Barcelona, Spain
[3] Ecole Polytech, LIX, Palaiseau, France
[4] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France
关键词
Data Streams; Random Forests; Hoeffding Tree; Low-latency; High performance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time mining of evolving data streams involves new challenges when targeting today's application domains such as the Internet of the Things: increasing volume, velocity and volatility requires data to be processed on-the-fly with fast reaction and adaptation to changes. This paper presents a high performance scalable design for decision trees and ensemble combinations that makes use of the vector SIMD and multicore capabilities available in modern processors to provide the required throughput and accuracy. The proposed design offers very low latency and good scalability with the number of cores on commodity hardware when compared to other state-of-the art implementations. On an Intel i7-based system, processing a single decision tree is 6x faster than MOA (Java), and 7x faster than StreamDM (C++), two well-known reference implementations. On the same system, the use of the 6 cores (and 12 hardware threads) available allow to process an ensemble of 100 learners 85x faster that MOA while providing the same accuracy. Furthermore, our solution is highly scalable: on an Intel Xeon socket with large core counts, the proposed ensemble design achieves up to 16x speed-up when employing 24 cores with respect to a single threaded execution.
引用
收藏
页码:223 / 232
页数:10
相关论文
共 50 条
  • [1] LOW-LATENCY SPECULATIVE INFERENCE ON DISTRIBUTED MULTI-MODAL DATA STREAMS
    Li, Tianxing
    Huang, Jin
    Risinger, Erik
    Ganesan, Deepak
    GETMOBILE-MOBILE COMPUTING & COMMUNICATIONS REVIEW, 2022, 26 (03) : 23 - 26
  • [2] Flexible multi-threaded scheduling for continuous queries over data streams
    Cammert, Michael
    Heinz, Christoph
    Kraemer, Juergen
    Seeger, Bernhard
    Vaupel, Sonny
    Wolske, Udo
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 624 - 633
  • [3] Low-Latency Analytics on Colossal Data Streams with SummaryStore
    Agrawal, Nitin
    Vulimiri, Ashish
    PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, : 647 - 664
  • [4] Towards Low-Latency Big Data Infrastructure at Sangfor
    Chen, Fei
    Yan, Zhengzheng
    Gu, Liang
    EMERGING INFORMATION SECURITY AND APPLICATIONS, EISA 2022, 2022, 1641 : 37 - 54
  • [5] CINTIA: a Distributed, Low-Latency Index for Big Interval Data
    Mavlyutov, Ruslan
    Cudre-Mauroux, Philippe
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 619 - 628
  • [6] Dynamic deadlock analysis of multi-threaded programs
    Bensalem, Saddek
    Havelund, Klaus
    HARDWARE AND SOFTWARE VERIFICATION AND TESTING, 2006, 3875 : 208 - 223
  • [7] Dynamic Terrain Data Visualization Using Virtual Paging in Multi-threaded Environment
    Porwal, Sudhir
    Rathi, Virendra Singh
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 503 - 505
  • [8] A Dynamic Logic for deductive verification of multi-threaded programs
    Beckert, Bernhard
    Klebanov, Vladimir
    FORMAL ASPECTS OF COMPUTING, 2013, 25 (03) : 405 - 437
  • [9] Parallelization and multi-threaded latency constrained parallel coding of JPEG XS
    Richter, Thomas
    Keinert, Joachim
    Foessel, Siegfried
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLII, 2019, 11137
  • [10] Dynamic Cache Contention Detection in Multi-threaded Applications
    Zhao, Qin
    Koh, David
    Raza, Syed
    Bruening, Derek
    Wong, Weng-Fai
    Amarasinghe, Saman
    ACM SIGPLAN NOTICES, 2011, 46 (07) : 27 - 37