Efficient Performance Prediction for Apache Spark

被引:25
|
作者
Cheng, Guoli [1 ]
Ying, Shi [1 ]
Wang, Bingming [1 ]
Li, Yuhang [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Bayi Rd 299, Wuhan, Peoples R China
基金
中国国家自然科学基金;
关键词
Performance prediction; Spark; System configuration; Adaboost; Projective sampling;
D O I
10.1016/j.jpdc.2020.10.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Spark is a more efficient distributed big data processing framework following Hadoop. It provides users with more than 180 adjustable configuration parameters, and how to choose the optimal configuration automatically to make the Spark application run effectively is challenging. The key to address the above challenge is having the ability to predict the performance of Spark applications in different configurations. This paper proposes a new approach based on Adaboost, which can efficiently and accurately predict the performance of a given application with a given Spark configuration. In our approach, Adaboost is used to build a set of performance models at the stage-level for Spark. To minimize the overhead of the modeling, we use the classic projective sampling, a data mining technique that allows us to collect as few training samples as possible while meeting the accuracy requirements. We evaluate the proposed approach on six typical Spark benchmarks with five input datasets. The experimental results show that our approach is less than the previously proposed approach in prediction error and cost. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:40 / 51
页数:12
相关论文
共 50 条
  • [1] Performance Prediction for Apache Spark Platform
    Wang, Kewen
    Khan, Mohammad Maifi Hasan
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 166 - 173
  • [2] Leveraging resource management for efficient performance of Apache Spark
    Khadija Aziz
    Dounia Zaidouni
    Mostafa Bellafkih
    Journal of Big Data, 6
  • [3] Leveraging resource management for efficient performance of Apache Spark
    Aziz, Khadija
    Zaidouni, Dounia
    Bellafkih, Mostafa
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [4] Efficient iterative virtual screening with Apache Spark and conformal prediction
    Ahmed, Laeeq
    Georgiev, Valentin
    Capuccini, Marco
    Toor, Salman
    Schaal, Wesley
    Laure, Erwin
    Spjuth, Ola
    JOURNAL OF CHEMINFORMATICS, 2018, 10
  • [5] Efficient iterative virtual screening with Apache Spark and conformal prediction
    Laeeq Ahmed
    Valentin Georgiev
    Marco Capuccini
    Salman Toor
    Wesley Schaal
    Erwin Laure
    Ola Spjuth
    Journal of Cheminformatics, 10
  • [6] Performance Prediction for Data-driven Workflows on Apache Spark
    Gulino, Andrea
    Canakoglu, Arif
    Ceri, Stefano
    Ardagna, Danilo
    2020 IEEE 28TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2020), 2020, : 167 - +
  • [7] Apache Spark and Apache Ignite Performance Analysis
    Stan, Cristiana-Stefania
    Pandelica, Adrian-Eduard
    Zamfir, Vlad-Andrei
    Stan, Roxana Gabriela
    Negru, Catalin
    2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, : 726 - 733
  • [8] Performance Comparison of Apache Hadoop and Apache Spark
    Singh, Amritpal
    Khamparia, Aditya
    Luhach, Ashish Kr
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS FOR COMPUTING RESEARCH (ICAICR '19), 2019,
  • [9] Execution Time Prediction for Apache Spark
    Gao, Zhipeng
    Wang, Ting
    Wang, Qian
    Yang, Yang
    2018 INTERNATIONAL CONFERENCE ON COMPUTING AND BIG DATA (ICCBD 2018), 2018, : 47 - 51
  • [10] An Enhanced Parallelisation Model for Performance Prediction of Apache Spark on a Multinode Hadoop Cluster
    Ahmed, Nasim
    Barczak, Andre L. C.
    Rashid, Mohammad A.
    Susnjak, Teo
    BIG DATA AND COGNITIVE COMPUTING, 2021, 5 (04)