An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [21] An Energy-aware Task Scheduling Algorithm for a Heterogeneous Data Center
    Zhang, Shuo
    Wang, Baosheng
    Zhao, Baokang
    Tao, Jing
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1471 - 1477
  • [22] Locality-Aware CTA Scheduling for Gaming Applications
    Ukarande, Aditya
    Patidar, Suryakant
    Rangan, Ram
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (01)
  • [23] Resource Scheduling and Data Locality for Virtualized Hadoop on IaaS Cloud Platform
    Tao, Dan
    Wang, Bingxu
    Lin, Zhaowen
    Wu, Tin-Yu
    [J]. BIG DATA COMPUTING AND COMMUNICATIONS, (BIGCOM 2016), 2016, 9784 : 332 - 341
  • [24] Profit-oriented task scheduling algorithm in Hadoop cluster
    Chai, Xu-qing
    Dong, Yong-liang
    Li, Jun-fei
    [J]. EURASIP JOURNAL ON EMBEDDED SYSTEMS, 2016,
  • [25] Locality-aware task scheduling for homogeneous parallel computing systems
    Bhatti, Muhammad Khurram
    Oz, Isil
    Amin, Sarah
    Mushtaq, Maria
    Farooq, Umer
    Popov, Konstantin
    Brorsson, Mats
    [J]. COMPUTING, 2018, 100 (06) : 557 - 595
  • [26] Locality-aware task scheduling for homogeneous parallel computing systems
    Muhammad Khurram Bhatti
    Isil Oz
    Sarah Amin
    Maria Mushtaq
    Umer Farooq
    Konstantin Popov
    Mats Brorsson
    [J]. Computing, 2018, 100 : 557 - 595
  • [27] Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
    Muddukrishna, Ananya
    Jonsson, Peter A.
    Brorsson, Mats
    [J]. SCIENTIFIC PROGRAMMING, 2015, 2015
  • [28] Leveraging Data-Flow Task Parallelism for Locality-Aware Dynamic Scheduling on Heterogeneous Platforms
    Simsek, Osman Seckin
    Drebes, Andi
    Pop, Antoniu
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 540 - 549
  • [29] The bandwidth-aware backup task scheduling strategy using SDN in Hadoop
    Fengjun Shang
    Xuanling Chen
    Chenyun Yan
    Luzhong Li
    Yuting Zhao
    [J]. Cluster Computing, 2019, 22 : 5975 - 5985
  • [30] An energy-aware scheduling algorithm for big data applications in Spark
    Hongjian Li
    Huochen Wang
    Shuyong Fang
    Yang Zou
    Wenhong Tian
    [J]. Cluster Computing, 2020, 23 : 593 - 609