Configuring Parallelism for Hybrid Layouts Using Multi-Objective Optimization

被引:0
|
作者
Munir, Rana Faisal [1 ,2 ]
Abello, Alberto [1 ]
Romero, Oscar [1 ]
Thiele, Maik [2 ]
Lehner, Wolfgang [2 ]
机构
[1] Univ Politecn Cataluna, Dept Serv & Informat Syst Engn, C Jordi Girona Salgado 1-3, Barcelona 08034, Spain
[2] Tech Univ Dresden, Fac Comp Sci, Dresden, Germany
关键词
big data; hybrid storage layouts; parallelism; Parquet; Spark;
D O I
10.1089/big.2019.0068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow (when required) to read less data from the disk. However, this is not very well exploited by distributed processing frameworks (e.g., Hadoop, Spark) when analytical queries are posed. These frameworks divide the data into multiple partitions and then process each partition in a separate task, consequently creating tasks based on the total file size and not the actual size of the data to be read. This typically leads to launching more tasks than needed, which, in turn, increases the query execution time and induces significant waste of computing resources. To allow a more efficient use of resources and reduce the query execution time, we propose a method that decides the number of tasks based on the data being read. To this end, we first propose a cost-based model for estimating the size of data read in hybrid layouts. Next, we use the estimated reading size in a multi-objective optimization method to decide the number of tasks and computational resources to be used. We prototyped our solution for Apache Parquet and Spark and found that our estimations are highly correlated (0.96) with the real executions. Further, using TPC-H we show that our recommended configurations are only 5.6% away from the Pareto front and provide 2.1 x speedup compared with default solutions.
引用
收藏
页码:235 / 247
页数:13
相关论文
共 50 条
  • [1] Automatically Configuring Parallelism for Hybrid Layouts
    Faisal Munir, Rana
    Abello, Alberto
    Romero, Oscar
    Thiele, Maik
    Lehner, Wolfgang
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 120 - 125
  • [2] A new hybrid memetic multi-objective optimization algorithm for multi-objective optimization
    Luo, Jianping
    Yang, Yun
    Liu, Qiqi
    Li, Xia
    Chen, Minrong
    Gao, Kaizhou
    [J]. INFORMATION SCIENCES, 2018, 448 : 164 - 186
  • [3] SMTIBEA: a hybrid multi-objective optimization algorithm for configuring large constrained software product lines
    Jianmei Guo
    Jia Hui Liang
    Kai Shi
    Dingyu Yang
    Jingsong Zhang
    Krzysztof Czarnecki
    Vijay Ganesh
    Huiqun Yu
    [J]. Software & Systems Modeling, 2019, 18 : 1447 - 1466
  • [4] SMTIBEA: a hybrid multi-objective optimization algorithm for configuring large constrained software product lines
    Guo, Jianmei
    Liang, Jia Hui
    Shi, Kai
    Yang, Dingyu
    Zhang, Jingsong
    Czarnecki, Krzysztof
    Ganesh, Vijay
    Yu, Huiqun
    [J]. SOFTWARE AND SYSTEMS MODELING, 2019, 18 (02): : 1447 - 1466
  • [5] Automated multi-objective optimization system for airport site layouts
    Khalafallah, Ahmed
    El-Rayes, Khaled
    [J]. AUTOMATION IN CONSTRUCTION, 2011, 20 (04) : 313 - 320
  • [6] Hybrid Multi-Objective Genetic Algorithm for Multi-Objective Optimization Problems
    Zhang, Song
    Wang, Hongfeng
    Yang, Di
    Huang, Min
    [J]. 2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 1970 - 1974
  • [7] Hybrid Metaheuristics for Multi-objective Optimization
    Talbi, E-G.
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2015, 9 (01) : 41 - 63
  • [8] Effective multi-objective discrete optimization of Truss-Z layouts using a GPU
    Zawidzki, Machi
    Szklarski, Jacek
    [J]. APPLIED SOFT COMPUTING, 2018, 70 : 501 - 512
  • [9] Multi-Objective Optimization of Hybrid Renewable Energy System Using an Enhanced Multi-Objective Evolutionary Algorithm
    Ming, Mengjun
    Wang, Rui
    Zha, Yabing
    Zhang, Tao
    [J]. ENERGIES, 2017, 10 (05)
  • [10] Multi-objective Optimization Using a Hybrid Differential Evolution Algorithm
    Wang, Xianpeng
    Tang, Lixin
    [J]. 2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,