Optimization of data flow execution in a parallel environment

被引:0
|
作者
Georgia Kougka
Anastasios Gounaris
机构
[1] Aristotle University of Thessaloniki,Department of Informatics
来源
关键词
Data flow optimization; Cost modeling; Task ordering;
D O I
暂无
中图分类号
学科分类号
摘要
Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.
引用
收藏
页码:385 / 410
页数:25
相关论文
共 50 条
  • [41] Adaptive and Big Data Scale Parallel Execution in Oracle
    Bellamkonda, Srikanth
    Li, Hua-Gang
    Jagtap, Unmesh
    Zhu, Yali
    Liang, Vince
    Cruanes, Thierry
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1102 - 1113
  • [42] Conditions for Parallel Execution of Functions in Data Mining Algorithm
    Kholod, Ivan I.
    [J]. PROCEEDINGS OF THE 2018 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2018, : 308 - 312
  • [43] Execution time prediction for parallel data processing tasks
    Juhász, S
    Charaf, H
    [J]. 10TH EUROMICRO WORKSHOP ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2002, : 31 - 38
  • [44] Data Enclave: A Data-Centric Trusted Execution Environment
    Xu, Yuanchao
    Pangia, James
    Ye, Chencheng
    Solihin, Yan
    Shen, Xipeng
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 218 - 232
  • [45] Optimization Technology of The Speculative Thread Execution Base on Parallel particle
    Yang Hong-bin
    Xu Zhen-kun
    Wu Yue
    [J]. 2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 2, 2011, : 1 - 5
  • [46] Policy-Aware Optimization of Parallel Execution of Composite Services
    Trang, Mai Xuan
    Murakami, Yohei
    Ishida, Toru
    [J]. 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2015), 2015, : 106 - 113
  • [47] Dynamic load balancing of physiological data flow in big data network parallel computing environment
    Zhang, Xiao-Dong
    Xia, Xiao-Jun
    Lyu, Hai-Feng
    Gong, Xu-Chao
    Lian, Meng-Jia
    [J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2020, 50 (01): : 247 - 254
  • [48] Supporting design patterns in a visual parallel data-flow programming environment
    Toyoda, M
    Shizuki, B
    Takahashi, S
    Matsuoka, S
    Shibayama, E
    [J]. 1997 IEEE SYMPOSIUM ON VISUAL LANGUAGES, PROCEEDINGS, 1997, : 76 - 83
  • [49] Data Flow Execution Models - a Third Opinion
    Sarkar, Vivek
    [J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 1 - 1
  • [50] An execution model for the seamless integration of control flow and data flow
    Ibrahim, B
    Randriamparany, F
    [J]. ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 402 - 404