Modeling Data Flow Execution in a Parallel Environment

被引:1
|
作者
Kougka, Georgia [1 ]
Gounaris, Anastasios [1 ]
Leser, Ulf [2 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
[2] Humboldt Univ, Inst Comp Sci, Berlin, Germany
关键词
MAPREDUCE JOBS; ETL WORKFLOWS; OPTIMIZATION; SYSTEMS;
D O I
10.1007/978-3-319-64283-3_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled. In this work, we propose a cost modeling solution that aims to accurately reflect the response time of a data flow that is executed in parallel. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified.
引用
收藏
页码:183 / 196
页数:14
相关论文
共 50 条
  • [1] Optimization of data flow execution in a parallel environment
    Georgia Kougka
    Anastasios Gounaris
    Distributed and Parallel Databases, 2019, 37 : 385 - 410
  • [2] Optimization of data flow execution in a parallel environment
    Kougka, Georgia
    Gounaris, Anastasios
    DISTRIBUTED AND PARALLEL DATABASES, 2019, 37 (03) : 385 - 410
  • [3] DATA-FLOW QUERY EXECUTION IN A PARALLEL MAIN-MEMORY ENVIRONMENT
    WILSCHUT, AN
    APERS, PMG
    DISTRIBUTED AND PARALLEL DATABASES, 1993, 1 (01) : 103 - 128
  • [4] INCORPORATING DATA FLOW IDEAS INTO VONNEUMANN PROCESSORS FOR PARALLEL EXECUTION
    BUEHRER, R
    EKANADHAM, K
    IEEE TRANSACTIONS ON COMPUTERS, 1987, 36 (12) : 1515 - 1522
  • [5] DATA-FLOW BASED EXECUTION MECHANISMS OF PARALLEL AND CONCURRENT PROLOG
    ITO, N
    SHIMIZU, H
    KISHI, M
    KUNO, E
    ROKUSAWA, K
    NEW GENERATION COMPUTING, 1985, 3 (01) : 15 - 41
  • [6] SCALABLE HEURISTIC ALGORITHMS FOR THE PARALLEL EXECUTION OF DATA FLOW ACYCLIC DIGRAPHS
    Mo, Zeyao
    Zhang, Aiqing
    Wittum, Gabriel
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (05): : 3626 - 3642
  • [7] Determining the execution time distribution for a data parallel program in a heterogeneous computing environment
    Li, YA
    Antonio, JK
    Siegel, HJ
    Tan, M
    Watson, DW
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 44 (01) : 35 - 52
  • [8] An environment for the parallel execution of multigrain clustered tasks
    Colin, JN
    ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 320 - 327
  • [9] Framework for workflow parallel execution in grid environment
    Huang, Lican
    Computational Science - ICCS 2007, Pt 3, Proceedings, 2007, 4489 : 228 - 235
  • [10] Environment for execution of active objects on parallel machines
    Gautron, P.
    Briot, J.-P.
    Saleh, H.
    Lemarie, S.
    Lescaudron, L.
    Proceedings of the European Workshops on Parallel Computing, 1992,