Configuring Parallelism for Hybrid Layouts Using Multi-Objective Optimization

被引:0
|
作者
Munir, Rana Faisal [1 ,2 ]
Abello, Alberto [1 ]
Romero, Oscar [1 ]
Thiele, Maik [2 ]
Lehner, Wolfgang [2 ]
机构
[1] Univ Politecn Cataluna, Dept Serv & Informat Syst Engn, C Jordi Girona Salgado 1-3, Barcelona 08034, Spain
[2] Tech Univ Dresden, Fac Comp Sci, Dresden, Germany
关键词
big data; hybrid storage layouts; parallelism; Parquet; Spark;
D O I
10.1089/big.2019.0068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow (when required) to read less data from the disk. However, this is not very well exploited by distributed processing frameworks (e.g., Hadoop, Spark) when analytical queries are posed. These frameworks divide the data into multiple partitions and then process each partition in a separate task, consequently creating tasks based on the total file size and not the actual size of the data to be read. This typically leads to launching more tasks than needed, which, in turn, increases the query execution time and induces significant waste of computing resources. To allow a more efficient use of resources and reduce the query execution time, we propose a method that decides the number of tasks based on the data being read. To this end, we first propose a cost-based model for estimating the size of data read in hybrid layouts. Next, we use the estimated reading size in a multi-objective optimization method to decide the number of tasks and computational resources to be used. We prototyped our solution for Apache Parquet and Spark and found that our estimations are highly correlated (0.96) with the real executions. Further, using TPC-H we show that our recommended configurations are only 5.6% away from the Pareto front and provide 2.1 x speedup compared with default solutions.
引用
收藏
页码:235 / 247
页数:13
相关论文
共 50 条
  • [21] Single and Multi-Objective Optimization of a Cogeneration System Using Hybrid Algorithms
    Padilha, Ricardo S.
    Santos, Hugo F. L.
    Colaco, Marcelo J.
    Cruz, Manuel E.
    [J]. HEAT TRANSFER ENGINEERING, 2009, 30 (04) : 261 - 271
  • [22] Multi-objective optimization design of layouts parameters for irregular pipeline on engine case
    Liu, Wei
    Zhu, Hongyan
    Zhao, Yujie
    Yue, Zhufeng
    [J]. Hangkong Dongli Xuebao/Journal of Aerospace Power, 2021, 36 (01): : 148 - 156
  • [23] Multi-objective optimization of wind farm layouts - Complexity, constraint handling and scalability
    Rodrigues, S.
    Bauer, P.
    Bosman, Peter A. N.
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2016, 65 : 587 - 609
  • [24] A Botnet Detection in IoT Using a Hybrid Multi-objective Optimization Algorithm
    Hosseini, Fatemeh
    Gharehchopogh, Farhad Soleimanian
    Masdari, Mohammad
    [J]. NEW GENERATION COMPUTING, 2022, 40 (03) : 809 - 843
  • [25] Multi-objective optimization of zero propellant maneuver using hybrid programming
    Zhao, Qian
    Huang, Haibing
    [J]. ACTA ASTRONAUTICA, 2015, 116 : 154 - 160
  • [26] A Multi-Objective Optimization Framework for Offshore Wind Farm Layouts and Electric Infrastructures
    Rodrigues, Silvio
    Restrepo, Carlos
    Katsouris, George
    Teixeira Pinto, Rodrigo
    Soleimanzadeh, Maryam
    Bosman, Peter
    Bauer, Pavol
    [J]. ENERGIES, 2016, 9 (03)
  • [27] Multi-objective optimization of reciprocal timber layouts from reclaimed stock elements
    Parigi, Dario
    Damkilde, Lars
    [J]. IASS 60TH ANNIVERSARY SYMPOSIUM (IASS SYMPOSIUM 2019) - 9TH INTERNATIONAL CONFERENCE ON TEXTILE COMPOSITES AND INFLATABLE STRUCTURES (STRUCTURAL MEMBRANES 2019), 2019, : 1171 - 1178
  • [28] A multi-objective methodology for spacecraft equipment layouts
    Ana Paula Curty Cuco
    Fabiano L. de Sousa
    Antônio J. Silva Neto
    [J]. Optimization and Engineering, 2015, 16 : 165 - 181
  • [29] A multi-objective methodology for spacecraft equipment layouts
    Curty Cuco, Ana Paula
    de Sousa, Fabiano L.
    Silva Neto, Antnio J.
    [J]. OPTIMIZATION AND ENGINEERING, 2015, 16 (01) : 165 - 181
  • [30] An efficient hybrid multi-objective particle swarm optimization with a multi-objective dichotomy line search
    Xu, Gang
    Yang, Yu-qun
    Liu, Bin-Bin
    Xu, Yi-hong
    Wu, Ai-jun
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2015, 280 : 310 - 326