ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets

被引:18
|
作者
Li, Mingyu [1 ,2 ]
Liu, Zhiqiang [1 ]
Shi, Xuanhua [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab,Cluster & Grid Comp, Wuhan 430074, Peoples R China
[2] Liupanshui Normal Univ, Sch Math & Comp Sci, Liupanshui 553004, Peoples R China
关键词
Big data; generative adversarial nets; spark; genetic algorithm; automatic tune parameters; OPTIMIZATION; ALGORITHM;
D O I
10.1109/ACCESS.2020.2979812
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5x on average, with a maximum of 6.9x. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7x that of DNN, 1.6x that of SVM, 1.7x that of DT on the five typical Spark programs.
引用
收藏
页码:50485 / 50496
页数:12
相关论文
共 50 条
  • [1] ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks
    Dou, Hui
    Wang, Kang
    Zhang, Yiwen
    Chen, Pengfei
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (05): : 2737 - 2755
  • [2] ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks
    Hui Dou
    Kang Wang
    Yiwen Zhang
    Pengfei Chen
    Cluster Computing, 2023, 26 : 2737 - 2755
  • [3] Auto-tuning Spark Configurations Based on Neural Network
    Gu, Jing
    Li, Ying
    Tang, Hongyan
    Wu, Zhonghai
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [4] DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks
    Dou, Hui
    Wang, Yilun
    Zhang, Yiwen
    Chen, Pengfei
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [5] TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks
    Dou, Hui
    Zhang, Lei
    Zhang, Yiwen
    Chen, Pengfei
    Zheng, Zibin
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 177 : 89 - 105
  • [6] DeepCAT+: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data Frameworks
    Dou, Hui
    Wang, Yilun
    Zhang, Yiwen
    Chen, Pengfei
    Zheng, Zibin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (11) : 2114 - 2131
  • [7] A plus Tuning: Architecture plus Application Auto-tuning for In-Memory Data-Processing Frameworks
    Wang, Han
    Rafatirad, Setareh
    Homayoun, Houman
    2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 163 - 166
  • [8] A data-based approach to auto-tuning PID controller
    Cheng, Cheng
    Chiu, Min-Sen
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 24 - +
  • [9] An Auto-tuning LQR based on Correlation Analysis
    Huang, Xujiang
    Li, Pu
    IFAC PAPERSONLINE, 2020, 53 (02): : 7148 - 7153
  • [10] Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading
    Jia, Zhen
    Xue, Chao
    Chen, Guancheng
    Zhan, Jianfeng
    Zhang, Lixin
    Lin, Yonghua
    Hofstee, Peter
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 387 - 400