Predicting the performance of big data applications on the cloud

被引:10
|
作者
Ardagna, D. [1 ]
Barbierato, E. [1 ]
Gianniti, E. [1 ]
Gribaudo, M. [1 ]
Pinto, T. B. M. [2 ]
da Silva, A. P. C. [2 ]
Almeida, J. M. [2 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
[2] Univ Fed Minas Gerais, Dept Ciencia Comp, Belo Horizonte, MG, Brazil
来源
JOURNAL OF SUPERCOMPUTING | 2021年 / 77卷 / 02期
基金
欧盟地平线“2020”;
关键词
Performance prediction; Apache spark; Parallel computing; Data science; Big data; Analytical and simulation models; SPARK;
D O I
10.1007/s11227-020-03307-w
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application's needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7% relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.
引用
收藏
页码:1321 / 1353
页数:33
相关论文
共 50 条
  • [1] Predicting the performance of big data applications on the cloud
    D. Ardagna
    E. Barbierato
    E. Gianniti
    M. Gribaudo
    T. B. M. Pinto
    A. P. C. da Silva
    J. M. Almeida
    [J]. The Journal of Supercomputing, 2021, 77 : 1321 - 1353
  • [2] Performance modeling of big data applications in the cloud centers
    Shen, Chao
    Tong, Weiqin
    Hwang, Jenq-Neng
    Gao, Qiang
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (05): : 2258 - 2283
  • [3] Performance Evaluation of Big Data Applications in Cloud Providers
    Dourado, Leonardo dos Santos
    Miranda, Richard Siqueira
    de Araujo, Aleteia P. F.
    Ishikawa, Edson
    [J]. 2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020), 2020,
  • [4] Performance modeling of big data applications in the cloud centers
    Chao Shen
    Weiqin Tong
    Jenq-Neng Hwang
    Qiang Gao
    [J]. The Journal of Supercomputing, 2017, 73 : 2258 - 2283
  • [5] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    [J]. PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [6] Performance analysis model for big data applications in cloud computing
    Villalpando, Luis Eduardo Bautista
    April, Alain
    Abran, Alain
    [J]. JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2014, 3
  • [7] Improving Performance of Cloud Computing and Big Data Technologies and Applications
    Zhenjiang Dong
    [J]. ZTE Communications, 2014, 12 (04) : 1 - 2
  • [8] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [9] Performance evaluation of edge cloud computing system for big data applications
    Femminella, Mauro
    Pergolesi, Matteo
    Reali, Gianluca
    [J]. 2016 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2016, : 170 - 175
  • [10] Performance-Aware Refactoring of Cloud-based Big Data Applications
    Li, Chen
    Casale, Giuliano
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1505 - 1510