Optimization Techniques for a Distributed In-Memory Computing Platform by Leveraging SSD

被引:1
|
作者
Choi, June [1 ]
Lee, Jaehyun [1 ]
Kim, Jik-Soo [2 ]
Lee, Jaehwan [1 ]
机构
[1] Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si 10540, South Korea
[2] Myongji Univ, Dept Comp Engn, Yongin 03674, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 18期
基金
新加坡国家研究基金会;
关键词
Apache Spark; memory management; solid-state drive; in-memory processing framework; performance; PageRank; transitive closure; TeraSort; k-means clustering; !text type='Java']Java[!/text] Virtual Machine heap configuration; resilient distributed dataset; MAPREDUCE;
D O I
10.3390/app11188476
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this paper, we present several optimization strategies that can improve the overall performance of the distributed in-memory computing system, "Apache Spark". Despite its distributed memory management capability for iterative jobs and intermediate data, Spark has a significant performance degradation problem when the available amount of main memory (DRAM, typically used for data caching) is limited. To address this problem, we leverage an SSD (solid-state drive) to supplement the lack of main memory bandwidth. Specifically, we present an effective optimization methodology for Apache Spark by collectively investigating the effects of changing the capacity fraction ratios of the shuffle and storage spaces in the "Spark JVM Heap Configuration" and applying different "RDD Caching Policies" (e.g., SSD-backed memory caching). Our extensive experimental results show that by utilizing the proposed optimization techniques, we can improve the overall performance by up to 42%.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] A survey of SRAM-based in-memory computing techniques and applications
    Mittal, Sparsh
    Verma, Gaurav
    Kaushik, Brajesh
    Khanday, Farooq A.
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 119
  • [22] IN-MEMORY INTELLIGENT COMPUTING
    Hahanov, V., I
    Abdullayev, V. H.
    Chumachenko, S., V
    Lytvynova, E., I
    Hahanova, I., V
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2024, (01) : 161 - 174
  • [23] In-memory computing with ferroelectrics
    Yang, Rui
    [J]. NATURE ELECTRONICS, 2020, 3 (05) : 237 - 238
  • [24] Ferroelectric FET Threshold Voltage Optimization for Reliable In-Memory Computing
    Prakash, Om
    Ni, Kai
    Amrouch, Hussam
    [J]. 2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
  • [25] Ferroelectric FET Threshold Voltage Optimization for Reliable In-Memory Computing
    Prakash, Om
    Ni, Kai
    Amrouch, Hussam
    [J]. 2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
  • [26] Load Balancing Scheme for Effectively Supporting Distributed In-Memory Based Computing
    Bok, Kyoungsoo
    Choi, Kitae
    Choi, Dojin
    Lim, Jongtae
    Yoo, Jaesoo
    [J]. ELECTRONICS, 2019, 8 (05)
  • [27] Design and implementation of reconfigurable acceleration for in-memory distributed big data computing
    Hou, Junjie
    Zhu, Yongxin
    Du, Sen
    Song, Shijin
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 : 68 - 75
  • [28] ClimateSpark: An in-memory distributed computing framework for big climate data analytics
    Hu, Fei
    Yang, Chaowei
    Schnase, John L.
    Duffy, Daniel Q.
    Xu, Mengchao
    Bowen, Michael K.
    Lee, Tsengdar
    Song, Weiwei
    [J]. COMPUTERS & GEOSCIENCES, 2018, 115 : 154 - 166
  • [29] SciSpark: Applying In-memory Distributed Computing to Weather Event Detection and Tracking
    Palamuttam, Rahul
    Mogrovejo, Renato Marroquin
    Mattmann, Chris
    Wilson, Brian
    Whitehall, Kim
    Verma, Rishi
    McGibbney, Lewis
    Ramirez, Paul
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2020 - 2026
  • [30] Memory devices and applications for in-memory computing
    Abu Sebastian
    Manuel Le Gallo
    Riduan Khaddam-Aljameh
    Evangelos Eleftheriou
    [J]. Nature Nanotechnology, 2020, 15 : 529 - 544