Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

被引:5
|
作者
Awan, Ahsan Javed [1 ]
Brorsson, Mats [1 ]
Vlassov, Vladimir [1 ]
Ayguade, Eduard [2 ]
机构
[1] KTH Royal Inst Technol, Dept Software & Comp Syst, Stockholm, Sweden
[2] Tech Univ Catalunya UPC, Barcelona Super Comp Ctr BSC, Barcelona, Spain
关键词
NUMA; SMT; Spark;
D O I
10.1145/3006299.3006319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.
引用
下载
收藏
页码:237 / 246
页数:10
相关论文
共 50 条
  • [1] Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 353 - 356
  • [2] Eager Memory Management for In-Memory Data Analytics
    Jang, Hakbeom
    Bae, Jonghyun
    Ham, Tae Jun
    Lee, Jae W.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03): : 632 - 636
  • [3] On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics
    Katsaragakis, Manolis
    Masouros, Dimosthenis
    Papadopoulos, Lazaros
    Catthoor, Francky
    Soudris, Dimitrios
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 945 - 952
  • [4] In-Memory Computing for Scalable Data Analytics
    Li, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 93 - 94
  • [5] An In-Memory based Framework for Scientific Data Analytics
    Elia, Donatello
    Fiore, Sandro
    D'Anca, Alessandro
    Palazzo, Cosimo
    Foster, Ian
    Williams, Dean N.
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 424 - 429
  • [6] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [7] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [8] SPARKBENCH: a spark benchmarking suite characterizing large-scale in-memory data analytics
    Li, Min
    Tan, Jian
    Wang, Yandong
    Zhang, Li
    Salapura, Valentina
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2575 - 2589
  • [9] SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics
    Min Li
    Jian Tan
    Yandong Wang
    Li Zhang
    Valentina Salapura
    Cluster Computing, 2017, 20 : 2575 - 2589
  • [10] CHOPPER: Optimizing Data Partitioning for In-Memory Data Analytics Frameworks
    Paul, Arnab Kumar
    Zhuang, Wenjie
    Xu, Luna
    Li, Min
    Rafique, M. Mustafa
    Butt, Ali R.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 110 - 119