Software-Defined Data Shuffling for Big Data Jobs with Task Duplication

被引:0
|
作者
Zang, Qimeng [1 ]
Chan, Hsiang-Yu [1 ]
Li, Peng [1 ]
Guo, Song [1 ]
机构
[1] Univ Aizu, Aizu Wakamatsu, Fukushima, Japan
关键词
shuffling; MapReduce; task duplication; traffic;
D O I
10.1109/ICPPW.2016.62
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big data jobs are usually executed on large-scale distributed computing platforms that automatically divide a job into multiple computation phases, each of which contains a number of independent tasks that can run in parallel. The data shuffling process between two consecutive phases becomes the bottleneck of job execution. To improve its performance, an approach of "push" shuffling is proposed to send intermediate results to next phase immediately once they are generated. It avoids local disk accesses in the traditional "pull" shuffling approach, and tasks in the next phase can start data processing without waiting tasks in the predecessive phase to finish. Task duplication is another approach to accelerate task execution by launching multiple task copies that compete for processing the same data block. When "push" shuffling meets task duplication, big data jobs can be significantly accelerated, but leading to a large amount of redundant data transmission between two phases. To address this challenge, we propose a software-define data shuffling approach by designing a controller and a janitor module to control the data shuffling process. Each task has a janitor that communicates with the controller to request admission permit of sending intermediate results to next-stage tasks. We further propose an online grouping algorithm to reduce the overhead of frequent communication with the controller. The performance of the proposed algorithm is evaluated by extensive simulations.
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [1] When Big Data Meets Software-Defined Networking: SDN for Big Data and Big Data for SDN
    Cui, Laizhong
    Yu, F. Richard
    Yan, Qiao
    [J]. IEEE NETWORK, 2016, 30 (01): : 58 - 65
  • [2] Principles of Software-defined Elastic Systems for Big Data Analytics
    Truong, Hong-Linh
    Dustdar, Schahram
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2014, : 562 - 567
  • [3] Software-Defined Data Center
    Ghazanfar Ali
    Jie Hu
    Bhumip Khasnabish
    [J]. ZTE Communications, 2013, 11 (04) : 2 - 7
  • [4] Software-Defined Data Services: Interoperable and Network-Aware Big Data Executions
    Kathiravelu, Pradeeban
    Van Roy, Peter
    Veiga, Luis
    [J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON SOFTWARE DEFINED SYSTEMS (SDS), 2018, : 145 - 152
  • [5] BigDataSDNSim: A simulator for analyzing big data applications in software-defined cloud data centers
    Alwasel, Khaled
    Calheiros, Rodrigo N.
    Garg, Saurabh
    Buyya, Rajkumar
    Pathan, Mukaddim
    Georgakopoulos, Dimitrios
    Ranjan, Rajiv
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2021, 51 (05): : 893 - 920
  • [6] Big-Data-Enabled Software-Defined Cellular Network Management
    Wen, Jiayao
    Li, Victor O. K.
    [J]. 2016 INTERNATIONAL CONFERENCE ON SOFTWARE NETWORKING (ICSN), 2016, : 11 - 15
  • [7] Software-Defined Green 5G System for Big Data
    Mi, Jun
    Wang, Kun
    Li, Peng
    Guo, Song
    Sun, Yanfei
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2018, 56 (11) : 116 - 123
  • [8] Building the Software-Defined Data Center
    Shabanov, B. M.
    Samovarov, O., I
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2019, 45 (08) : 458 - 466
  • [9] OpenTap: Software-defined data acquisition
    Macias, Christian
    Dasari, Venkat
    McGarry, Michael P.
    [J]. DISRUPTIVE TECHNOLOGIES IN INFORMATION SCIENCES, 2018, 10652
  • [10] Building the Software-Defined Data Center
    B. M. Shabanov
    O. I. Samovarov
    [J]. Programming and Computer Software, 2019, 45 : 458 - 466