A data locality based scheduler to enhance MapReduce performance in heterogeneous environments

被引:37
|
作者
Naik, Nenavath Srinivas [1 ]
Negi, Atul [1 ]
Bapu, Tapas B. R. [2 ]
Anitha, R. [2 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, India
[2] SA Engn Coll, Madras, Tamil Nadu, India
关键词
MapReduce; Data locality; Task scheduler; Heterogeneous environments; PATH;
D O I
10.1016/j.future.2018.07.043
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
MapReduce is an essential framework for distributed storage and parallel processing for large-scale data-intensive jobs proposed in recent times. Hadoop default scheduler assumes homogeneous environment. This assumption of homogeneity does not work at all times in practice and limits the performance of MapReduce. Data locality is essentially moving computation closer (faster access) to the input data. Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective. Improving data locality for MapReduce framework is an important issue to improve the performance of large-scale Hadoop clusters. This paper proposes a novel data locality based scheduler which allocates input data blocks to the nodes based on their processing capacity. Also schedules map andreduce tasks to the nodes based on their computing ability in the heterogeneous Hadoop cluster. We evaluate proposed scheduler using different workloads from Hi-Bench benchmark suite. The experimental results prove that our proposed scheduler enhances the MapReduce performance in heterogeneous environments. Minimizes job execution time, and also improves data locality for different parameters as compared to the Hadoop default scheduler, Matchmaking scheduler and Delay scheduler respectively. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:423 / 434
页数:12
相关论文
共 50 条
  • [41] MRA plus plus : Scheduling and data placement on MapReduce for heterogeneous environments
    Anjos, Julio C. S.
    Carrera, Ivan
    Kolberg, Wagner
    Tibola, Andre Luis
    Arantes, Luciana B.
    Geyer, Claudio R.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 42 : 22 - 35
  • [42] FiGMR: A Fine-Grained MapReduce Scheduler in the Heterogeneous Cloud
    Mao, Yingchi
    Qi, Hai
    Ping, Ping
    Li, Xiaofang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2016, : 1956 - 1963
  • [43] Load Balancing in Heterogeneous MapReduce Environments
    Fan, Yuanquan
    Wu, Weiguo
    Qian, Depei
    Xu, Yunlong
    Wei, Wei
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1480 - 1489
  • [44] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
    Chandra, Subhash
    Motwani, Deepak
    [J]. 2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
  • [45] HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters
    Sun, Mingming
    Zhuang, Hang
    Zhou, Xuehai
    Lu, Kun
    Li, Changlong
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 82 - 95
  • [46] A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
    Shang, Fengjun
    Chen, Xuanling
    Yan, Chenyun
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (04): : 2821 - 2831
  • [47] A strategy for scheduling reduce task based on intermediate data locality of the MapReduce
    Fengjun Shang
    Xuanling Chen
    Chenyun Yan
    [J]. Cluster Computing, 2017, 20 : 2821 - 2831
  • [48] Dependency-Aware Data Locality for MapReduce
    Fan, Xiaoyi
    Ma, Xiaoqiang
    Liu, Jiangchuan
    Li, Dan
    [J]. 2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 409 - 416
  • [49] Dependency-Aware Data Locality for MapReduce
    Ma, Xiaoqiang
    Fan, Xiaoyi
    Liu, Jiangchuan
    Li, Dan
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (03) : 667 - 679
  • [50] HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers
    Sharma, Bikash
    Wood, Timothy
    Das, Chita R.
    [J]. 2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 102 - 111