On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications

被引:33
|
作者
Ke, Huan [1 ]
Li, Peng [1 ]
Guo, Song [1 ]
Guo, Minyi [2 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 8580, Japan
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; partition; aggregation; big data; lagrangian decomposition;
D O I
10.1109/TPDS.2015.2419671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
引用
收藏
页码:818 / 828
页数:11
相关论文
共 50 条
  • [1] Map Reduce for big data processing based on traffic aware partition and aggregation
    G. Venkatesh
    K. Arunesh
    [J]. Cluster Computing, 2019, 22 : 12909 - 12915
  • [2] Map Reduce for big data processing based on traffic aware partition and aggregation
    Venkatesh, G.
    Arunesh, K.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5): : 12909 - 12915
  • [3] Improving Network Traffic in MapReduce for Big Data Applications
    Gawande, Priya
    Shaikh, Nuzhaft
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2979 - 2983
  • [4] Traffic-aware Carrier Allocation with Aggregation for Load Balancing
    Lee, Haeyoung
    Vahid, Seiamak
    Moessner, Klaus
    [J]. 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC), 2017,
  • [5] Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications
    Mashayekhy, Lena
    Nejad, Mahyar Movahed
    Grosu, Daniel
    Zhang, Quan
    Shi, Weisong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (10) : 2720 - 2733
  • [6] HTPC: heterogeneous traffic-aware partition coding for random packet spraying in data center networks
    Huang, Jiawei
    Wang, Shiqi
    Li, Shuping
    Zou, Shaojun
    Hu, Jinbin
    Wang, Jianxin
    [J]. JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2021, 10 (01):
  • [7] HTPC: heterogeneous traffic-aware partition coding for random packet spraying in data center networks
    Jiawei Huang
    Shiqi Wang
    Shuping Li
    Shaojun Zou
    Jinbin Hu
    Jianxin Wang
    [J]. Journal of Cloud Computing, 10
  • [8] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    [J]. 2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
  • [9] Traffic-aware Resource allocation with aggregation in Heterogeneous Networks with WLANs
    Lee, Haeyoung
    Vahid, Seiamak
    Moessner, Klaus
    [J]. 2018 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC), 2018, : 70 - 74
  • [10] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Yaxiong Zhao
    Jie Wu
    Cong Liu
    [J]. Tsinghua Science and Technology, 2014, 19 (01) : 39 - 50