Analytical Performance Models for MapReduce Workloads

被引:40
|
作者
Vianna, Emanuel [1 ]
Comarela, Giovanni [1 ]
Pontes, Tatiana [1 ]
Almeida, Jussara [1 ]
Almeida, Virgilio [1 ]
Wilkinson, Kevin [2 ]
Kuno, Harumi [2 ]
Dayal, Umeshwar [2 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] HP Labs, Informat Analyt Lab, Palo Alto, CA USA
关键词
Performance; Hadoop; Pipeline; Queuing Network; Task graph;
D O I
10.1007/s10766-012-0227-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
MapReduce is a currently popular programming model to support parallel computations on large datasets. Among the several existing MapReduce implementations, Hadoop has attracted a lot of attention from both industry and research. In a Hadoop job, map and reduce tasks coordinate to produce a solution to the input problem, exhibiting precedence constraints and synchronization delays that are characteristic of a pipeline communication between maps (producers) and reduces (consumers). We here address the challenge of designing analytical models to estimate the performance of MapReduce workloads, notably Hadoop workloads, focusing particularly on the intra-job pipeline parallelism between map and reduce tasks belonging to the same job. We propose a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time, throughput and resource utilization. We validate our solution against a queuing network simulator and a real setup in various scenarios, finding very close agreement in both cases. In particular, our model produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.
引用
收藏
页码:495 / 525
页数:31
相关论文
共 50 条
  • [21] Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
    Tang, Shanjiang
    Lee, Bu-Sung
    He, Bingsheng
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2016, 9 (01) : 4 - 17
  • [22] On the optimization of schedules for MapReduce workloads in the presence of shared scans
    Wolf, Joel
    Balmin, Andrey
    Rajan, Deepak
    Hildrum, Kirsten
    Khandekar, Rohit
    Parekh, Sujay
    Wu, Kun-Lung
    Vernica, Rares
    [J]. VLDB JOURNAL, 2012, 21 (05): : 589 - 609
  • [23] On the optimization of schedules for MapReduce workloads in the presence of shared scans
    Joel Wolf
    Andrey Balmin
    Deepak Rajan
    Kirsten Hildrum
    Rohit Khandekar
    Sujay Parekh
    Kun-Lung Wu
    Rares Vernica
    [J]. The VLDB Journal, 2012, 21 : 589 - 609
  • [24] Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment
    Jeyaraj, Rathinaraja
    Ananthanarayana, V. S.
    Paul, Anand
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (17):
  • [25] Performance comparison under failures of MPI and MapReduce: An analytical approach
    Jin, Hui
    Sun, Xian-He
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (07): : 1808 - 1815
  • [26] A comprehensive analytical performance model for disk devices under random workloads
    Triantafillou, P
    Christodoulakis, S
    Georgiadis, CA
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) : 140 - 155
  • [27] A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads
    Zacheilas N.
    Kalogeraki V.
    [J]. Eurasip Journal on Embedded Systems, 2017, 2017 (01)
  • [28] MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads
    Veiga, Jorge
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 80 - 89
  • [29] Smart Shuffling in MapReduce: a solution to Balance Network Traffic and Workloads
    Shi, Wei
    Wang, Yang
    Corriveau, Jean-Pierre
    Niu, Boqiang
    Croft, William Lee
    Peng, Mengfei
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 35 - 44
  • [30] MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads
    Tang, Shanjiang
    Lee, Bu-Sung
    He, Bingsheng
    [J]. EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 291 - 304