A Machine Learning Approach for Predicting Execution Time of Spark Jobs

被引:22
|
作者
Mustafa, Sara [1 ]
Elghandour, Iman [1 ]
Ismail, Mohamed A. [1 ]
机构
[1] Alexandria Univ, Comp & Syst Engn, Alexandria, Egypt
关键词
Spark; Execution Time Prediction; Machine Learning; QUERIES;
D O I
10.1016/j.aej.2018.03.006
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Spark has gained growing attention in the past couple of years as an in-memory cloud computing platform. It supports execution of various types of workloads such as SQL queries and machine learning applications. Currently, many enterprises use Spark to exploit its fast in-memory processing of large scale data. Additionally, speeding up the execution in Spark is an important problem for many real-time applications. This can be achieved by improving the scheduling approaches employed by Spark, optimizing the execution plans generated by Spark for various applications, and selecting the best cluster configuration to run an input workload. A first step for all these optimization approaches is to predict the execution time of an input Spark application. In this paper, we present a new platform that predicts with high accuracy the execution time of SQL queries and machine learning applications executed by Spark. We evaluate our proposed platform by measuring the accuracy of predicting execution time of various types of Spark jobs including TPC-H queries and machine learning classification/clustering applications. The evaluation experiments show that we are able to predict the execution time of Spark jobs using our proposed platform with accuracy greater than 90% for SQL queries and greater than 75% for machine learning jobs. (C) 2018 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
下载
收藏
页码:3767 / 3778
页数:12
相关论文
共 50 条
  • [1] Big data execution time based on Spark Machine Learning Libraries
    Garate-Escamilla, Anna Karen
    Hajjam El Hassani, Amir
    Andres, Emmanuel
    PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2019), 2019, : 78 - 83
  • [2] Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach
    Pham, Thanh-Phuong
    Durillo, Juan J.
    Fahringer, Thomas
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (01) : 256 - 268
  • [3] A Machine Learning-Based Approach for Predicting the Execution Time of CFD Applications on Cloud Computing Environment
    Duong Ngoc Hieu
    Thai Tieu Minh
    Trinh Van Quang
    Bui Xuan Giang
    Tran Van Hoai
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2016, 2016, 10018 : 40 - 52
  • [4] A Machine Learning Approach for an HPC Use Case: the Jobs Queuing Time Prediction
    Vercellino, Chiara
    Scionti, Alberto
    Varavallo, Giuseppe
    Viviani, Paolo
    Vitali, Giacomo
    Terzo, Olivier
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 143 : 215 - 230
  • [5] Predicting SQL Query Execution Time with a Cost Model for Spark Platform
    Burdakov, Aleksey
    Proletarskaya, Viktoria
    Ploutenko, Andrey
    Ermakov, Oleg
    Grigorev, Uriy
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 279 - 287
  • [6] Combining Machine Learning & Metaheuristic Algorithms for Predicting Waiting Time of High Performance Computing jobs
    Ramachandran, Suja
    Jayalal, M. L.
    Vasudevan, M.
    Jehadeesan, R.
    2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [7] Predicting Terrorism with Machine Learning: Lessons from "Predicting Terrorism: A Machine Learning Approach"
    Basuchoudhary, Atin
    Bang, James T.
    PEACE ECONOMICS PEACE SCIENCE AND PUBLIC POLICY, 2018, 24 (04)
  • [8] SOME CLARIFICATIONS ON THE BICRITERIA SCHEDULING OF UNIT EXECUTION TIME JOBS ON A SINGLE-MACHINE
    DE, P
    GHOSH, JB
    WELLS, CE
    COMPUTERS & OPERATIONS RESEARCH, 1991, 18 (08) : 717 - 720
  • [9] Execution Time Prediction for Apache Spark
    Gao, Zhipeng
    Wang, Ting
    Wang, Qian
    Yang, Yang
    2018 INTERNATIONAL CONFERENCE ON COMPUTING AND BIG DATA (ICCBD 2018), 2018, : 47 - 51
  • [10] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49