Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

被引:5
|
作者
Awan, Ahsan Javed [1 ]
Brorsson, Mats [1 ]
Vlassov, Vladimir [1 ]
Ayguade, Eduard [2 ]
机构
[1] KTH Royal Inst Technol, Dept Software & Comp Syst, Stockholm, Sweden
[2] Tech Univ Catalunya UPC, Barcelona Super Comp Ctr BSC, Barcelona, Spain
关键词
NUMA; SMT; Spark;
D O I
10.1145/3006299.3006319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.
引用
下载
收藏
页码:237 / 246
页数:10
相关论文
共 50 条
  • [21] A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics
    Zhiguang Chen
    Yutong Lu
    Nong Xiao
    Fang Liu
    Knowledge and Information Systems, 2014, 41 : 335 - 354
  • [22] A large scale analysis of hundreds of in-memory cache clusters at Twitter
    Yang, Juncheng
    Yue, Yao
    Rashmi, K., V
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 191 - 208
  • [23] Cloud-Based In-Memory Columnar Database Architecture for Continuous Audit Analytics
    Wang, Yunsen
    Kogan, Alexander
    JOURNAL OF INFORMATION SYSTEMS, 2020, 34 (02) : 87 - 107
  • [24] A Many-core Architecture for In-Memory Data Processing
    Agrawal, Sandeep R.
    Idicula, Sam
    Raghavan, Arun
    Vlachos, Evangelos
    Govindaraju, Venkatraman
    Varadarajan, Venkatanathan
    Balkesen, Cagri
    Giannikis, Georgios
    Roth, Charlie
    Agarwal, Nipun
    Sedlar, Eric
    50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 245 - 258
  • [25] Exploration of In-Memory Computing for Big Data Analytics using Queuing Theory
    Srivastava, Riktesh
    2018 2ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2018), 2018, : 11 - 16
  • [26] ClimateSpark: An in-memory distributed computing framework for big climate data analytics
    Hu, Fei
    Yang, Chaowei
    Schnase, John L.
    Duffy, Daniel Q.
    Xu, Mengchao
    Bowen, Michael K.
    Lee, Tsengdar
    Song, Weiwei
    COMPUTERS & GEOSCIENCES, 2018, 115 : 154 - 166
  • [27] Resource-Aware Cache Management for In-Memory Data Analytics Frameworks
    Zhao, Zhengyang
    Zhang, Haitao
    Geng, Xin
    Ma, Huadong
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 364 - 371
  • [28] A Performance Study on Large-Scale Data Analytics Using Disk-Based and In-Memory Database Systems
    Chao, Pingfu
    He, Dan
    Sadiq, Shazia
    Zheng, Kai
    Zhou, Xiaofang
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 247 - 254
  • [29] On Mixing High-Speed Updates and In-Memory Queries A Big-Data Architecture for Real-time Analytics
    Zhong, Tao
    Doshi, Kshitij A.
    Tang, Xi
    Lou, Ting
    Lu, Zhongyan
    Li, Hong
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [30] VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters
    Ma, Zhiqiang
    Hong, Ke
    Gu, Lin
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1, 2013, : 56 - 63