On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications

被引:33
|
作者
Ke, Huan [1 ]
Li, Peng [1 ]
Guo, Song [1 ]
Guo, Minyi [2 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 8580, Japan
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; partition; aggregation; big data; lagrangian decomposition;
D O I
10.1109/TPDS.2015.2419671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
引用
收藏
页码:818 / 828
页数:11
相关论文
共 50 条
  • [41] Scalable Traffic-Aware Virtual Machine Management for Cloud Data Centers
    Tso, Fung Po
    Oikonomou, Konstantinos
    Kavvadia, Eleni
    Pezaros, Dimitrios P.
    [J]. 2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2014), 2014, : 238 - 247
  • [42] TARS: traffic-aware route search
    Levin, Roy
    Kanza, Yaron
    [J]. GEOINFORMATICA, 2014, 18 (03) : 461 - 500
  • [43] Distributed Sketching with Traffic-Aware Summaries
    Harris, Dor
    Rinberg, Arik
    Rottenstreich, Ori
    [J]. 2021 IFIP NETWORKING CONFERENCE AND WORKSHOPS (IFIP NETWORKING), 2021,
  • [44] Traffic-Aware VDC Embedding in Data Center: A Case Study of FatTree
    Luo Shouxi
    Yu Hongfang
    Li Lemin
    Liao Dan
    Sun Gang
    [J]. CHINA COMMUNICATIONS, 2014, 11 (07) : 142 - 152
  • [45] Cost-efficient traffic-aware data collection protocol in VANET
    He, Zongjian
    Zhang, Daqiang
    [J]. AD HOC NETWORKS, 2017, 55 : 28 - 39
  • [46] Traffic-Aware Autonomous Driving with Differentiable Traffic Simulation
    Zheng, Laura
    Son, Sanghyun
    Lin, Ming C.
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 3517 - 3523
  • [47] Traffic-Aware Firewall Optimization Strategies
    Acharya, Subrata
    Wang, Jia
    Ge, Zihui
    Znati, Taieb F.
    Greenberg, Albert
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-12, 2006, : 2225 - 2230
  • [48] Traffic-Aware Data Delivery Strategy for Vehicular Ad Hoc Networks
    Lo, Chun-Chih
    Kuo, Yau-Hwang
    [J]. 2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 18 - 22
  • [49] The Scalability of Volunteer Computing for MapReduce Big Data Applications
    Li, Wei
    Guo, William
    [J]. DATA SCIENCE, PT 1, 2017, 727 : 153 - 165
  • [50] Investigation and Characterization of MapReduce Applications for Big Data Analytics
    Li, Y.
    Lam, T. B. V.
    Do, T. V. Van
    Chakka, R.
    Rotter, C.
    [J]. JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2018, 77 (09): : 493 - 498