On Datacenter-Network-Aware Load Balancing in MapReduce

被引:2
|
作者
Le, Yanfang [1 ]
Wang, Feng [2 ]
Liu, Jiangchuan [1 ]
Ergun, Funda [1 ,3 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Univ Mississippi, University, MS 38677 USA
[3] Indiana Univ Bloomington, Bloomington, IN USA
关键词
D O I
10.1109/CLOUD.2015.71
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce has emerged as a powerful tool for distributed and scalable processing of voluminous data. For skewed data input, load balancing is necessary among the MapReduce worker nodes to minimize the overall finishing time, which however can incur massive data movement in a datacenter network. In this paper, we for the first time examine this problem of datacenter-network-aware load balancing in the shuffle subphase in MapReduce. Different from earlier studies that generally assume the network inside a datacenter has negligible delay and infinite capacity, we consider the traffic and bottlenecks in real datacenter networks by introducing the constraints on available network bandwidth, and demonstrate that the corresponding problem can be decomposed into two subproblems for network flow and load balancing, respectively. We show effective solutions to both of them, which together yield a complete solution towards near optimal datacenter-network-aware load balancing. A much simpler yet performance-wise comparable greedy algorithm is also developed for fast implementation in practice. The effectiveness of our solution has been demonstrated on synthetic and real public datasets.
引用
收藏
页码:485 / 492
页数:8
相关论文
共 50 条
  • [31] Accelerating Reads With In-Network Consistency-Aware Load Balancing
    Kettaneh, Ibrahim
    Alquraan, Ahmed
    Takruri, Hatem
    Mashtizadeh, Ali Jose
    Al-Kiswany, Samer
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (03) : 954 - 968
  • [32] Online Load Balancing for MapReduce with Skewed Data Input
    Le, Yanfang
    Liu, Jiangchuan
    Erguen, Funda
    Wang, Dan
    [J]. 2014 PROCEEDINGS IEEE INFOCOM, 2014, : 2004 - 2012
  • [33] BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks
    Liu, Yuan
    Li, Wenxin
    Qu, Wenyu
    Qi, Heng
    [J]. 51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [34] Load Balancing for MapReduce-based Entity Resolution
    Kolb, Lars
    Thor, Andreas
    Rahm, Erhard
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 618 - 629
  • [35] Load Balancing in MapReduce Based on Scalable Cardinality Estimates
    Gufler, Benjamin
    Augsten, Nikolaus
    Reiser, Angelika
    Kemper, Alfons
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 522 - 533
  • [36] Towards Coordinated Congestion Control and Load Balancing in Datacenter Networks
    Zhao, Zhengwei
    Jiang, Zhixiong
    Lu, Chunyang
    Cai, Yushan
    Bi, Jingping
    [J]. 2013 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2013, : 1285 - 1290
  • [37] Enabling Traffic-Differentiated Load Balancing for Datacenter Networks
    Hu, Jinbin
    Liu, Ying
    Rao, Shuying
    Wang, Jing
    Zhang, Dengyong
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III, 2024, 14489 : 250 - 269
  • [38] Adjusting Switching Granularity of Load Balancing for Heterogeneous Datacenter Traffic
    Hu, Jinbin
    Huang, Jiawei
    Lyu, Wenjun
    Li, Weihe
    Li, Zhaoyi
    Jiang, Wenchao
    Wang, Jianxin
    He, Tian
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (05) : 2367 - 2384
  • [39] Efficient Service Broker Policy for Intra Datacenter Load Balancing
    Patel, Ritesh
    Patel, Sandip
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS, ICTIS 2018, VOL 2, 2019, 107 : 683 - 692
  • [40] OmniFlow: Coupling Load Balancing with Flow Control in Datacenter Networks
    Wen, Kaiyuan
    Qian, Zhuzhong
    Zhang, Sheng
    Lu, Sanglu
    [J]. PROCEEDINGS 2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS ICDCS 2016, 2016, : 725 - 726