Optimization of data flow execution in a parallel environment

被引:4
|
作者
Kougka, Georgia [1 ]
Gounaris, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
关键词
Data flow optimization; Cost modeling; Task ordering; QUERIES;
D O I
10.1007/s10619-018-7243-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.
引用
收藏
页码:385 / 410
页数:26
相关论文
共 50 条
  • [1] Optimization of data flow execution in a parallel environment
    Georgia Kougka
    Anastasios Gounaris
    [J]. Distributed and Parallel Databases, 2019, 37 : 385 - 410
  • [2] Modeling Data Flow Execution in a Parallel Environment
    Kougka, Georgia
    Gounaris, Anastasios
    Leser, Ulf
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 183 - 196
  • [3] DATA-FLOW QUERY EXECUTION IN A PARALLEL MAIN-MEMORY ENVIRONMENT
    WILSCHUT, AN
    APERS, PMG
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 1993, 1 (01) : 103 - 128
  • [4] INCORPORATING DATA FLOW IDEAS INTO VONNEUMANN PROCESSORS FOR PARALLEL EXECUTION
    BUEHRER, R
    EKANADHAM, K
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1987, 36 (12) : 1515 - 1522
  • [5] SCALABLE HEURISTIC ALGORITHMS FOR THE PARALLEL EXECUTION OF DATA FLOW ACYCLIC DIGRAPHS
    Mo, Zeyao
    Zhang, Aiqing
    Wittum, Gabriel
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (05): : 3626 - 3642
  • [6] DATA-FLOW BASED EXECUTION MECHANISMS OF PARALLEL AND CONCURRENT PROLOG
    ITO, N
    SHIMIZU, H
    KISHI, M
    KUNO, E
    ROKUSAWA, K
    [J]. NEW GENERATION COMPUTING, 1985, 3 (01) : 15 - 41
  • [7] Determining the execution time distribution for a data parallel program in a heterogeneous computing environment
    Li, YA
    Antonio, JK
    Siegel, HJ
    Tan, M
    Watson, DW
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 44 (01) : 35 - 52
  • [8] An environment for the parallel execution of multigrain clustered tasks
    Colin, JN
    [J]. ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 320 - 327
  • [9] Framework for workflow parallel execution in grid environment
    Huang, Lican
    [J]. Computational Science - ICCS 2007, Pt 3, Proceedings, 2007, 4489 : 228 - 235
  • [10] Query optimization and execution in a parallel analytics DBMS
    Eavis, Todd
    Taleb, Ahmad
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 897 - 908