Cost-effective Resource Provisioning for Spark Workloads

被引:13
|
作者
Chen, Yuxing [1 ]
Lu, Jiaheng [1 ]
Chen, Chen [2 ]
Hoque, Mohammad [1 ]
Tarkoma, Sasu [1 ]
机构
[1] Univ Helsinki, Helsinki, Finland
[2] Huawei Canada Res Ctr, Toronto, ON, Canada
基金
芬兰科学院;
关键词
Resource provisioning; Spark executor parameter; Simulation; Cost model; Performance metrics;
D O I
10.1145/3357384.3358090
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The salient feature of our method is that it allows us to invest low training cost while obtaining an accurate prediction. Through experiments with six benchmark workloads, we demonstrate that the cost model yields less than 7% error on average prediction accuracy and the recommendation achieves up to 5x resource cost saving.
引用
收藏
页码:2477 / 2480
页数:4
相关论文
共 50 条
  • [1] SimCost: cost-effective resource provision prediction and recommendation for spark workloads
    Yuxing Chen
    Mohammad A. Hoque
    Pengfei Xu
    Jiaheng Lu
    Sasu Tarkoma
    [J]. Distributed and Parallel Databases, 2024, 42 : 73 - 102
  • [2] SimCost: cost-effective resource provision prediction and recommendation for spark workloads
    Chen, Yuxing
    Hoque, Mohammad A.
    Xu, Pengfei
    Lu, Jiaheng
    Tarkoma, Sasu
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2024, 42 (01) : 73 - 102
  • [3] Cost-Effective Resource Provisioning for MapReduce in a Cloud
    Palanisamy, Balaji
    Singh, Aameek
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (05) : 1265 - 1279
  • [4] Dynamic Resource Provisioning for Iterative Workloads on Apache Spark
    Cheng, Dazhao
    Wang, Yu
    Dai, Dong
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 639 - 652
  • [5] Cost-Effective Traffic Scheduling and Resource Allocation for Edge Service Provisioning
    Xiang, Zhengzhe
    Zheng, Yuhang
    Zheng, Zengwei
    Deng, Shuiguang
    Guo, Minyi
    Dustdar, Schahram
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 2934 - 2949
  • [6] Cost-Effective Resource Provisioning for Real-Time Workflow in Cloud
    Wu, Lei
    Ding, Ran
    Jia, Zhaohong
    Li, Xuejun
    [J]. COMPLEXITY, 2020, 2020
  • [7] Toward cost-effective storage provisioning for DBMSs
    Zhang, Ning
    Tatemura, Junichi
    Patel, Jignesh M.
    Hacigumus, Hakan
    [J]. VLDB JOURNAL, 2014, 23 (02): : 329 - 354
  • [8] Toward cost-effective storage provisioning for DBMSs
    Ning Zhang
    Junichi Tatemura
    Jignesh M. Patel
    Hakan Hacigumus
    [J]. The VLDB Journal, 2014, 23 : 329 - 354
  • [9] Provisioning Cost-Effective Mobile Video Caching
    Ghoreishi, Seyed Ehsan
    Friderikos, Vasilis
    Karamshuk, Dmytro
    Sastry, Nishanth
    Aghvami, A. Hamid
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [10] Cost-effective Provisioning of Spot Instances in Clouds
    Miao, He
    Li, Liu
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 194 - 197