Modeling Data Flow Execution in a Parallel Environment

被引:1
|
作者
Kougka, Georgia [1 ]
Gounaris, Anastasios [1 ]
Leser, Ulf [2 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
[2] Humboldt Univ, Inst Comp Sci, Berlin, Germany
关键词
MAPREDUCE JOBS; ETL WORKFLOWS; OPTIMIZATION; SYSTEMS;
D O I
10.1007/978-3-319-64283-3_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled. In this work, we propose a cost modeling solution that aims to accurately reflect the response time of a data flow that is executed in parallel. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified.
引用
收藏
页码:183 / 196
页数:14
相关论文
共 50 条
  • [41] Modeling the parallel execution profile of a CFD simulation on a cluster of workstations
    Walker, E
    Song, JJ
    1996 IEEE SECOND INTERNATIONAL CONFERENCE ON ALGORITHMS & ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP'96, PROCEEDINGS OF, 1996, : 397 - 404
  • [42] On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems
    Pavlo, Andrew
    Jones, Evan P. C.
    Zdonik, Stanley
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (02): : 85 - 96
  • [43] Supporting design patterns in a visual parallel data-flow programming environment
    Toyoda, M
    Shizuki, B
    Takahashi, S
    Matsuoka, S
    Shibayama, E
    1997 IEEE SYMPOSIUM ON VISUAL LANGUAGES, PROCEEDINGS, 1997, : 76 - 83
  • [44] Data Flow Execution Models - a Third Opinion
    Sarkar, Vivek
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 1 - 1
  • [45] An execution model for the seamless integration of control flow and data flow
    Ibrahim, B
    Randriamparany, F
    ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 402 - 404
  • [46] Polarized Data Parallel Data Flow
    Lippmeier, Ben
    Mackay, Fil
    Robinson, Amos
    FHPC'16: PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE COMPUTING, 2016, : 52 - 57
  • [47] Data Randomization for Multi-Variant Execution Environment
    Hwang, Dongil
    Shin, Jangseop
    Kim, Jeehwan
    Paek, Yunheung
    2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 291 - 292
  • [48] Efficient execution of parallel aggregate data cube queries in data warehouse environments
    Tan, RBN
    Taniar, D
    Lu, G
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 709 - 716
  • [49] Raw data queries during data-intensive parallel workflow execution
    Silva, Vitor
    Leite, Jose
    Camata, Jose J.
    de Oliveira, Daniel
    Coutinho, Alvaro L. G. A.
    Valduriez, Patrick
    Mattoso, Marta
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 : 402 - 422
  • [50] Simultaneous CPU–GPU Execution of Data Parallel Algorithmic Skeletons
    Fabian Wrede
    Steffen Ernsting
    International Journal of Parallel Programming, 2018, 46 : 42 - 61