Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

被引:1
|
作者
Shi, Juwei [1 ]
Lu, Jiaheng [2 ]
机构
[1] Microsoft Cooperat, STCA, Redmond, WA 98052 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
关键词
MAPREDUCE; OPTIMIZATION;
D O I
10.1109/ICDEW53142.2021.00026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of pre-emptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.
引用
收藏
页码:104 / 111
页数:8
相关论文
共 50 条
  • [1] Performance models of data parallel DAG workflows for large scale data analytics
    Shi, Juwei
    Lu, Jiaheng
    DISTRIBUTED AND PARALLEL DATABASES, 2023, 41 (03) : 299 - 329
  • [2] Performance models of data parallel DAG workflows for large scale data analytics
    Juwei Shi
    Jiaheng Lu
    Distributed and Parallel Databases, 2023, 41 : 299 - 329
  • [3] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
    Kuplicki, Rayus
    Touthang, James
    Al Zoubi, Obada
    Mayeli, Ahmad
    Misaki, Masaya
    Aupperle, Robin L.
    Teague, T. Kent
    McKinney, Brett A.
    Paulus, Martin P.
    Bodurka, Jerzy
    FRONTIERS IN PSYCHIATRY, 2021, 12
  • [4] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [5] A Parallel Graph Environment for Real-World Data Analytics Workflows
    Castellana, Vito Giovanni
    Drocco, Maurizio
    Feo, John
    Firoz, Jesun
    Kanewala, Thejaka
    Lumsdaine, Andrew
    Manzano, Joseph
    Marquez, Andres
    Minutoli, Marco
    Suetterlein, Joshua
    Tumeo, Antonino
    Zalewski, Marcin
    2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1313 - 1318
  • [6] Large Scale Infrastructure for Health Data Analytics
    Crossfield, Samantha
    Johnson, Owen
    Fleming, Thomas
    2016 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2016, : 306 - 306
  • [7] Enabling Data Analytics in Large Scale Manufacturing
    Kampker, Achim
    Heimes, Heiner
    Buehrer, Ulrich
    Lienemann, Christoph
    Krotil, Stefan
    4TH INTERNATIONAL CONFERENCE ON SYSTEM-INTEGRATED INTELLIGENCE: INTELLIGENT, FLEXIBLE AND CONNECTED SYSTEMS IN PRODUCTS AND PRODUCTION, 2018, 24 : 120 - 127
  • [8] Software abstractions for large-scale deep learning models in big data analytics
    Khan A.H.
    Qamar A.M.
    Yusuf A.
    Khan R.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (04): : 557 - 566
  • [9] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [10] Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics
    Khan, Ayaz H.
    Qamar, Ali Mustafa
    Yusuf, Aneeq
    Khan, Rehanullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (04) : 557 - 566