Phase-Reconfigurable Shuffle Optimization for Hadoop MapReduce

被引:7
|
作者
Wang, Jihe [1 ]
Qiu, Meikang [2 ]
Guo, Bing [1 ]
Zong, Ziliang [3 ]
机构
[1] Sichuan Univ, Comp Sci Coll, Chengdu 610064, Sichuan, Peoples R China
[2] Pace Univ, Seidenberg Sch Comp Sci & Informat Syst, New York, NY 10038 USA
[3] Texas State Univ, Comp Sci Dept, San Macos, TX USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
MapReduce; big data; shuffle; optimization; reconfigure; sort; group; exploration; PERFORMANCE;
D O I
10.1109/TCC.2015.2459707
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop MapReduce is a leading open source framework that supports the realization of the Big Data revolution and serves as a pioneering platform in ultra large amount of information storing and processing. However, tuning a MapReduce system has become a difficult task because a large number of parameters restrict its performance, many of which are related with shuffle, a complicated phase between map and reduce functions, including sorting, grouping, and HTTP transferring. During shuffle phase, a large mount of time is spent on disk I/O due to the low speed of data throughput. In this paper, we build a mathematical model to judge the computing complexity of different operating orders within map-side shuffle, so that a faster execution can be achieved through reconfiguring the order of sorting and grouping. Furthermore, a three-dimensional exploring space of the performance is expanded, with which, some sampled features during shuffle stage, such as key number, spilling file number, and the variances of intermediate results, are collected to support the evaluation of computing complexity of each operating order. Thus, an optimized reconfiguration of map-side shuffle architecture can be achieved within Hadoop without extra disk I/O induced. Comparing with the original Hadoop implementation, the results show that our reconfigurable architecture gains up to 2.37 x speedup to finish the map-side shuffle work.
引用
收藏
页码:418 / 431
页数:14
相关论文
共 50 条
  • [41] Scheduling for response time in Hadoop MapReduce
    Dai, Xiangming
    Bensaou, Brahim
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [42] Dynamic Workload Balancing for Hadoop MapReduce
    Hou, Xiaofei
    Kumar, Ashwin T. K.
    Thomas, Johnson P.
    Varadharaj, Vijay
    [J]. 2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 56 - 62
  • [43] Using Hadoop MapReduce in a Multicluster Environment
    Tomasic, I.
    Rashkovska, A.
    Depolli, M.
    [J]. 2013 36TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2013, : 345 - 350
  • [44] A REVIEW ON JOB SCHEDULING FOR HADOOP MAPREDUCE
    Kalia, Khushboo
    Gupta, Neeraj
    [J]. 2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 75 - 79
  • [45] Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce
    Lu, Lu
    Jin, Hai
    Shi, Xuanhua
    Fedak, Gilles
    [J]. 2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 76 - 84
  • [46] Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors
    Fotakis, Dimitris
    Milis, Ioannis
    Papadigenopoulos, Orestis
    Zampetakis, Emmanouil
    Zois, Georgios
    [J]. EXPERIMENTAL ALGORITHMS, SEA 2015, 2015, 9125 : 137 - 150
  • [47] MapReduce scheduling algorithms in Hadoop: a systematic study
    Hedayati, Soudabeh
    Maleki, Neda
    Olsson, Tobias
    Ahlgren, Fredrik
    Seyednezhad, Mahdi
    Berahmand, Kamal
    [J]. JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
  • [48] Evaluating MapReduce on Virtual Machines: The Hadoop Case
    Ibrahim, Shadi
    Jin, Hai
    Lu, Lu
    Qi, Li
    Wu, Song
    Shi, Xuanhua
    [J]. CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 519 - +
  • [49] Analysis, Modeling, and Simulation of Hadoop YARN MapReduce
    Bressoud, Thomas C.
    Tang, Qiuyi
    [J]. 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 980 - 988
  • [50] Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems
    Chen, Fangfei
    Kodialam, Murali
    Lakshman, T. V.
    [J]. 2012 PROCEEDINGS IEEE INFOCOM, 2012, : 1143 - 1151