Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

被引：5

作者：

Awan, Ahsan Javed ^{[1
]}

Brorsson, Mats ^{[1
]}

Vlassov, Vladimir ^{[1
]}

Ayguade, Eduard ^{[2
]}

机构：

[1] KTH Royal Inst Technol, Dept Software & Comp Syst, Stockholm, Sweden

[2] Tech Univ Catalunya UPC, Barcelona Super Comp Ctr BSC, Barcelona, Spain

来源：

2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT) | 2016年

关键词：

NUMA; SMT; Spark;

D O I：

10.1145/3006299.3006319

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.

引用

下载

页码：237 / 246

页数：10

共 50 条

[1] Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters
Koliopoulos, Aris-Kyriakos
Yiapanis, Paraskevas
Tekiner, Firat
Nenadic, Goran
Keane, John
2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 353 - 356
[2] Eager Memory Management for In-Memory Data Analytics
Jang, Hakbeom
Bae, Jonghyun
Ham, Tae Jun
Lee, Jae W.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03): : 632 - 636
[3] On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics
Katsaragakis, Manolis
Masouros, Dimosthenis
Papadopoulos, Lazaros
Catthoor, Francky
Soudris, Dimitrios
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 945 - 952
[4] In-Memory Computing for Scalable Data Analytics
Li, Jun
2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 93 - 94
[5] An In-Memory based Framework for Scientific Data Analytics
Elia, Donatello
Fiore, Sandro
D'Anca, Alessandro
Palazzo, Cosimo
Foster, Ian
Williams, Dean N.
PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 424 - 429
[6] Distributed In-Memory Analytics for Big Temporal Data
Yao, Bin
Zhang, Wei
Wang, Zhi-Jie
Chen, Zhongpu
Shang, Shuo
Zheng, Kai
Guo, Minyi
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
[7] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
Huang, Yin
Yesha, Yelena
Halem, Milton
Yesha, Yaacov
Zhou, Shujia
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
[8] SPARKBENCH: a spark benchmarking suite characterizing large-scale in-memory data analytics
Li, Min
Tan, Jian
Wang, Yandong
Zhang, Li
Salapura, Valentina
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2575 - 2589
[9] SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics
Min Li
Jian Tan
Yandong Wang
Li Zhang
Valentina Salapura
Cluster Computing, 2017, 20 : 2575 - 2589
[10] CHOPPER: Optimizing Data Partitioning for In-Memory Data Analytics Frameworks
Paul, Arnab Kumar
Zhuang, Wenjie
Xu, Luna
Li, Min
Rafique, M. Mustafa
Butt, Ali R.
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 110 - 119

← 1 2 3 4 5 →