An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [1] An improved task scheduling algorithm based on cache locality and data locality in Hadoop
    Zhang, Peng
    Li, Chunlin
    Zhao, Yahui
    [J]. 2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 244 - 249
  • [2] Data-locality-aware mapreduce real-time scheduling framework
    Kao, Yu-Chon
    Chen, Ya-Shu
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 112 : 65 - 77
  • [3] A data-locality-aware task scheduler for distributed social graph queries
    Jin, Jiahui
    Luo, Junzhou
    Du, Mingyang
    Dang, Yongcheng
    Li, Feng
    Zhang, Jinghui
    Song, Aibo
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 1010 - 1022
  • [4] Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications
    Arslan, Engin
    Shekhar, Mrigank
    Kosar, Tevfik
    [J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 17 - 24
  • [5] Locality Aware Task Scheduling in Parallel Data Stream Processing
    Falt, Zbynek
    Krulis, Martin
    Bednarek, David
    Yaghob, Jakub
    Zavoral, Filip
    [J]. INTELLIGENT DISTRIBUTED COMPUTING VIII, 2015, 570 : 331 - 342
  • [6] RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop
    Midoun, Khadidja
    Hidouci, Walid-Khaled
    Loudini, Malik
    Belayadi, Djahida
    [J]. ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2019, 50 : 271 - 280
  • [7] LaSA: A Locality-aware Scheduling Algorithm for Hadoop-MapReduce Resource Assignment
    Chen, Tseng-Yi
    Wei, Hsin-Wen
    Wei, Ming-Feng
    Chen, Ying-Jie
    Hsu, Tsan-Sheng
    Shih, Wei-Kuan
    [J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 342 - 346
  • [8] An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications
    Fu, Zhongming
    Tang, Zhuo
    Yang, Li
    Liu, Chubo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2406 - 2420
  • [9] Data-Locality-Aware User Grouping in Cloud Radio Access Networks
    Ao, Weng Chon
    Psounis, Konstantinos
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2018, 17 (11) : 7295 - 7308
  • [10] A Task Scheduling Algorithm for Hadoop Platform
    Chen, Jilan
    Wang, Dan
    Zhao, Wenbing
    [J]. JOURNAL OF COMPUTERS, 2013, 8 (04) : 929 - 936