Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams

被引：0

作者：

Marron, Diego ^{[1
,2
]}

Ayguade, Eduard ^{[1
,2
]}

Herrero, Jose R. ^{[2
]}

Read, Jesse ^{[3
]}

Bifet, Albert ^{[4
]}

机构：

[1] Barcelona Supercomp Ctr, Comp Sci Dept, Barcelona, Spain

[2] Univ Politecn Cataluna, Comp Architecture Dept, Barcelona, Spain

[3] Ecole Polytech, LIX, Palaiseau, France

[4] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年

关键词：

Data Streams; Random Forests; Hoeffding Tree; Low-latency; High performance;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time mining of evolving data streams involves new challenges when targeting today's application domains such as the Internet of the Things: increasing volume, velocity and volatility requires data to be processed on-the-fly with fast reaction and adaptation to changes. This paper presents a high performance scalable design for decision trees and ensemble combinations that makes use of the vector SIMD and multicore capabilities available in modern processors to provide the required throughput and accuracy. The proposed design offers very low latency and good scalability with the number of cores on commodity hardware when compared to other state-of-the art implementations. On an Intel i7-based system, processing a single decision tree is 6x faster than MOA (Java), and 7x faster than StreamDM (C++), two well-known reference implementations. On the same system, the use of the 6 cores (and 12 hardware threads) available allow to process an ensemble of 100 learners 85x faster that MOA while providing the same accuracy. Furthermore, our solution is highly scalable: on an Intel Xeon socket with large core counts, the proposed ensemble design achieves up to 16x speed-up when employing 24 cores with respect to a single threaded execution.

引用

页码：223 / 232

页数：10

共 50 条

[1] LOW-LATENCY SPECULATIVE INFERENCE ON DISTRIBUTED MULTI-MODAL DATA STREAMS
Li, Tianxing
Huang, Jin
Risinger, Erik
Ganesan, Deepak
GETMOBILE-MOBILE COMPUTING & COMMUNICATIONS REVIEW, 2022, 26 (03) : 23 - 26
[2] Flexible multi-threaded scheduling for continuous queries over data streams
Cammert, Michael
Heinz, Christoph
Kraemer, Juergen
Seeger, Bernhard
Vaupel, Sonny
Wolske, Udo
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 624 - 633
[3] Low-Latency Analytics on Colossal Data Streams with SummaryStore
Agrawal, Nitin
Vulimiri, Ashish
PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, : 647 - 664
[4] Towards Low-Latency Big Data Infrastructure at Sangfor
Chen, Fei
Yan, Zhengzheng
Gu, Liang
EMERGING INFORMATION SECURITY AND APPLICATIONS, EISA 2022, 2022, 1641 : 37 - 54
[5] CINTIA: a Distributed, Low-Latency Index for Big Interval Data
Mavlyutov, Ruslan
Cudre-Mauroux, Philippe
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 619 - 628
[6] Dynamic deadlock analysis of multi-threaded programs
Bensalem, Saddek
Havelund, Klaus
HARDWARE AND SOFTWARE VERIFICATION AND TESTING, 2006, 3875 : 208 - 223
[7] Dynamic Terrain Data Visualization Using Virtual Paging in Multi-threaded Environment
Porwal, Sudhir
Rathi, Virendra Singh
COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 503 - 505
[8] A Dynamic Logic for deductive verification of multi-threaded programs
Beckert, Bernhard
Klebanov, Vladimir
FORMAL ASPECTS OF COMPUTING, 2013, 25 (03) : 405 - 437
[9] Parallelization and multi-threaded latency constrained parallel coding of JPEG XS
Richter, Thomas
Keinert, Joachim
Foessel, Siegfried
APPLICATIONS OF DIGITAL IMAGE PROCESSING XLII, 2019, 11137
[10] Dynamic Cache Contention Detection in Multi-threaded Applications
Zhao, Qin
Koh, David
Raza, Syed
Bruening, Derek
Wong, Weng-Fai
Amarasinghe, Saman
ACM SIGPLAN NOTICES, 2011, 46 (07) : 27 - 37

← 1 2 3 4 5 →