Configuring Parallelism for Hybrid Layouts Using Multi-Objective Optimization

被引:0
|
作者
Munir, Rana Faisal [1 ,2 ]
Abello, Alberto [1 ]
Romero, Oscar [1 ]
Thiele, Maik [2 ]
Lehner, Wolfgang [2 ]
机构
[1] Univ Politecn Cataluna, Dept Serv & Informat Syst Engn, C Jordi Girona Salgado 1-3, Barcelona 08034, Spain
[2] Tech Univ Dresden, Fac Comp Sci, Dresden, Germany
关键词
big data; hybrid storage layouts; parallelism; Parquet; Spark;
D O I
10.1089/big.2019.0068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow (when required) to read less data from the disk. However, this is not very well exploited by distributed processing frameworks (e.g., Hadoop, Spark) when analytical queries are posed. These frameworks divide the data into multiple partitions and then process each partition in a separate task, consequently creating tasks based on the total file size and not the actual size of the data to be read. This typically leads to launching more tasks than needed, which, in turn, increases the query execution time and induces significant waste of computing resources. To allow a more efficient use of resources and reduce the query execution time, we propose a method that decides the number of tasks based on the data being read. To this end, we first propose a cost-based model for estimating the size of data read in hybrid layouts. Next, we use the estimated reading size in a multi-objective optimization method to decide the number of tasks and computational resources to be used. We prototyped our solution for Apache Parquet and Spark and found that our estimations are highly correlated (0.96) with the real executions. Further, using TPC-H we show that our recommended configurations are only 5.6% away from the Pareto front and provide 2.1 x speedup compared with default solutions.
引用
收藏
页码:235 / 247
页数:13
相关论文
共 50 条
  • [31] Automated Design of Architectural Layouts Using a Multi-Objective Evolutionary Algorithm
    Chia, Darcy
    While, Lyndon
    [J]. SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 760 - 772
  • [32] Multi-objective Slime Mold Algorithm: A Slime Mold Approach Using Multi-objective Optimization for Parallel Hybrid Power System
    Zhu, Tianjun
    Wan, Hegao
    Ouyang, Zhuang
    Wu, Tunglung
    Liang, Jianguo
    Li, Weihao
    Li, Bin
    Han, Shiting
    [J]. SENSORS AND MATERIALS, 2022, 34 (10) : 3837 - 3856
  • [33] A Hybrid Immigrants Strategy for Dynamic Multi-objective Optimization
    Shi, Lulu
    Wu, Yan
    Zhou, Yan
    [J]. PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 589 - 593
  • [34] Multi-Objective Optimization of Hybrid PVT Solar Panels
    Ouhsiane, L.
    Siroux, M.
    El Ganaoui, M.
    Mimet, A.
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE AND UTILITY EXHIBITION ON GREEN ENERGY FOR SUSTAINABLE DEVELOPMENT (ICUE 2018), 2018,
  • [35] Multi-objective Optimization of Hybrid Electric Vehicle Powertrain
    Dong, Enguo
    Zhang, Lei
    Shi, Hong
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 1429 - 1432
  • [36] A Hybrid Multi-objective Immune Algorithm for Numerical Optimization
    Leung, Chris S. K.
    Lau, Henry Y. K.
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE, VOL 1: ECTA, 2016, : 105 - 114
  • [37] A Multi-Objective Hybrid Algorithm for Optimization of Grid Structures
    Xiong, Zhe
    Li, Xiao-Hui
    Liang, Jing-Chang
    Li, Li-Juan
    [J]. INTERNATIONAL JOURNAL OF APPLIED MECHANICS, 2018, 10 (01)
  • [38] New hybrid algorithm for multi-objective structural optimization
    Samira, El Moumen
    Rachid, Ellaia
    Rajae, Aboulaich
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND SYSTEMS MANAGEMENT (IEEE-IESM 2013), 2013, : 458 - 462
  • [39] The Multi-Objective Routing Optimization Algorithm for Hybrid SDN
    Gu, Suolin
    Luo, Lijuan
    Zhao, Zhekun
    Li, Xiaofang
    [J]. PROCEEDINGS OF THE 28TH CONFERENCE OF SPACECRAFT TT&C TECHNOLOGY IN CHINA: OPENNESS, INTEGRATION AND INTELLIGENT INTERCONNECTION, 2018, 445 : 487 - 499
  • [40] Multi-objective trajectory optimization for a hybrid propulsion system
    Li, Taibo
    Wang, Zhaokui
    Zhang, Yulin
    [J]. ADVANCES IN SPACE RESEARCH, 2018, 62 (05) : 1102 - 1113