Analytical Performance Models for MapReduce Workloads

被引:40
|
作者
Vianna, Emanuel [1 ]
Comarela, Giovanni [1 ]
Pontes, Tatiana [1 ]
Almeida, Jussara [1 ]
Almeida, Virgilio [1 ]
Wilkinson, Kevin [2 ]
Kuno, Harumi [2 ]
Dayal, Umeshwar [2 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] HP Labs, Informat Analyt Lab, Palo Alto, CA USA
关键词
Performance; Hadoop; Pipeline; Queuing Network; Task graph;
D O I
10.1007/s10766-012-0227-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
MapReduce is a currently popular programming model to support parallel computations on large datasets. Among the several existing MapReduce implementations, Hadoop has attracted a lot of attention from both industry and research. In a Hadoop job, map and reduce tasks coordinate to produce a solution to the input problem, exhibiting precedence constraints and synchronization delays that are characteristic of a pipeline communication between maps (producers) and reduces (consumers). We here address the challenge of designing analytical models to estimate the performance of MapReduce workloads, notably Hadoop workloads, focusing particularly on the intra-job pipeline parallelism between map and reduce tasks belonging to the same job. We propose a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time, throughput and resource utilization. We validate our solution against a queuing network simulator and a real setup in various scenarios, finding very close agreement in both cases. In particular, our model produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.
引用
收藏
页码:495 / 525
页数:31
相关论文
共 50 条
  • [1] Analytical Performance Models for MapReduce Workloads
    Emanuel Vianna
    Giovanni Comarela
    Tatiana Pontes
    Jussara Almeida
    Virgílio Almeida
    Kevin Wilkinson
    Harumi Kuno
    Umeshwar Dayal
    [J]. International Journal of Parallel Programming, 2013, 41 : 495 - 525
  • [2] Performance Modelling and Analysis of MapReduce/Hadoop Workloads
    Yu, Xiaolong
    Li, Wei
    [J]. 2015 IEEE 21ST INTERNATIONAL WORKSHOP ON LOCAL & METROPOLITAN AREA NETWORKS (LANMAN), 2015,
  • [3] An Analytical Approach to Evaluation of SSD Effects under MapReduce Workloads
    Ahn, Sungyong
    Park, Sangkyu
    [J]. JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2015, 15 (05) : 511 - 518
  • [4] HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
    Abouzeid, Azza
    Bajda-Pawlikowski, Kamil
    Abadi, Daniel
    Silberschatz, Avi
    Rasin, Alexander
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
  • [5] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [6] AN ANALYTICAL PERFORMANCE MODEL OF MAPREDUCE
    Yang, Xiao
    Sun, Jianling
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 306 - 310
  • [7] Semantic Characterization of MapReduce Workloads
    Xu, Zhihong
    Hirzel, Martin
    Rothermel, Gregg
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 87 - +
  • [8] POSUM: A Portfolio Scheduler for MapReduce Workloads
    Voinea, Maria A.
    Uta, Alexandru
    Iosup, Alexandru
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 351 - 357
  • [9] Shared Execution of Recurring Workloads in MapReduce
    Lei, Chuan
    Zhuang, Zhongfang
    Rundensteiner, Elke A.
    Eltabakh, Mohamed
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (07): : 714 - 725
  • [10] A Dynamic MapReduce Scheduler for Heterogeneous Workloads
    Tian, Chao
    Zhou, Haojie
    He, Yongqiang
    Zha, Li
    [J]. 2009 EIGHTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2009, : 218 - 224