Phase-Reconfigurable Shuffle Optimization for Hadoop MapReduce

被引:7
|
作者
Wang, Jihe [1 ]
Qiu, Meikang [2 ]
Guo, Bing [1 ]
Zong, Ziliang [3 ]
机构
[1] Sichuan Univ, Comp Sci Coll, Chengdu 610064, Sichuan, Peoples R China
[2] Pace Univ, Seidenberg Sch Comp Sci & Informat Syst, New York, NY 10038 USA
[3] Texas State Univ, Comp Sci Dept, San Macos, TX USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
MapReduce; big data; shuffle; optimization; reconfigure; sort; group; exploration; PERFORMANCE;
D O I
10.1109/TCC.2015.2459707
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop MapReduce is a leading open source framework that supports the realization of the Big Data revolution and serves as a pioneering platform in ultra large amount of information storing and processing. However, tuning a MapReduce system has become a difficult task because a large number of parameters restrict its performance, many of which are related with shuffle, a complicated phase between map and reduce functions, including sorting, grouping, and HTTP transferring. During shuffle phase, a large mount of time is spent on disk I/O due to the low speed of data throughput. In this paper, we build a mathematical model to judge the computing complexity of different operating orders within map-side shuffle, so that a faster execution can be achieved through reconfiguring the order of sorting and grouping. Furthermore, a three-dimensional exploring space of the performance is expanded, with which, some sampled features during shuffle stage, such as key number, spilling file number, and the variances of intermediate results, are collected to support the evaluation of computing complexity of each operating order. Thus, an optimized reconfiguration of map-side shuffle architecture can be achieved within Hadoop without extra disk I/O induced. Comparing with the original Hadoop implementation, the results show that our reconfigurable architecture gains up to 2.37 x speedup to finish the map-side shuffle work.
引用
收藏
页码:418 / 431
页数:14
相关论文
共 50 条
  • [1] Improving the Shuffle of Hadoop MapReduce
    Li, Jingui
    Lin, Xuelian
    Cui, Xiaolong
    Ye, Yue
    [J]. 2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1, 2013, : 266 - 273
  • [2] Improving the Map and Shuffle Phases in Hadoop MapReduce
    Lakshmi, J. V. N.
    [J]. SMART COMPUTING AND INFORMATICS, 2018, 77 : 203 - 212
  • [3] Hadoop MapReduce与Spark 的Shuffle过程原理
    胡必波
    彭平
    李散散
    [J]. 信息技术与信息化, 2021, (05) : 63 - 66
  • [4] Similarity-based Node Distance Exploring and Locality-aware Shuffle Optimization for Hadoop MapReduce
    Wang, Jihe
    Wang, Danghui
    Zhang, Meng
    Qiu, Meikang
    Guo, Bing
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2017, : 103 - 108
  • [5] A MapReduce Optimization Method on Hadoop Cluster
    Wu, Xiaodong
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 18 - 21
  • [6] An Active Reconfigurable Intelligent Surface Utilizing Phase-Reconfigurable Reflection Amplifiers
    Rao, Junhui
    Zhang, Yujie
    Tang, Shiwen
    Li, Zan
    Chiu, Chi-Yuk
    Murch, Ross
    [J]. IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2023, 71 (07) : 3189 - 3202
  • [7] Compact Phase-Reconfigurable Couplers With Wide Tuning Range
    Pan, Yu Fei
    Zheng, Shao Yong
    Chan, Wing Shing
    Liu, Hai Wen
    [J]. IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2020, 68 (02) : 681 - 692
  • [8] PROGRAMMING ENVIRONMENT FOR PHASE-RECONFIGURABLE PARALLEL PROGRAMMING ON SUPERNODE
    ADAMO, JM
    TREJO, L
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 23 (03) : 278 - 292
  • [9] Accelerating the Shuffle Phase to Speed up MapReduce Systems
    Yu, Rujie
    Yu, Songping
    Xiao, Nong
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 71 - 74
  • [10] Performance optimization for short job execution in Hadoop MapReduce
    Gu, Rong
    Yan, Jinshuang
    Yang, Xiaoliang
    Yuan, Chunfeng
    Huang, Yihua
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (06): : 1270 - 1280