Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

被引：21

作者：

Tang, Shanjiang ^{[1
]}

Lee, Bu-Sung ^{[2
]}

He, Bingsheng ^{[2
]}

机构：

[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON SERVICES COMPUTING | 2016年 / 9卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

MapReduce; Hadoop; flow-shops; scheduling algorithm; job ordering; 2-STAGE;

D O I：

10.1109/TSC.2015.2426186

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 similar to 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.

引用

页码：4 / 17

页数：14

共 50 条

[31] A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
Phuong Nguyen
Simon, Tyler
Halem, Milton
Chapman, David
Le, Quang
2012 IEEE/ACM FIFTH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2012), 2012, : 161 - 167
[32] MREv: An automatic mapreduce evaluation tool for big data workloads
20153401191864
(1) Computer Architecture Group, University of A Coruña, Spain, (Elsevier B.V., Netherlands):
[33] HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
Abouzeid, Azza
Bajda-Pawlikowski, Kamil
Abadi, Daniel
Silberschatz, Avi
Rasin, Alexander
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
[34] MapReduce Job Scheduling Based on Remaining Job Sizes
Matsuki, Tatsuma
Takine, Tetsuya
IEICE TRANSACTIONS ON COMMUNICATIONS, 2015, E98B (01) : 180 - 189
[35] Dynamic Slot-based Task Scheduling Based on Node Workload in a MapReduce Computation Model
Shih, Hsin-Yu
Huang, Jhih-Jia
Leu, Jenq-Shiou
2012 INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY AND IDENTIFICATION (ASID), 2012,
[36] A review on job scheduling for hadoop mapreduce
Kalia, Khushboo
Gupta, Neeraj
Proceedings - 2017 International Conference on Next Generation Computing and Information Systems, ICNGCIS 2017, 2018, : 86 - 91
[37] MapReduce Job Optimization: A Mapping Study
Lu, Qinghua
Zhu, Liming
Zhang, He
Wu, Dongyao
Li, Zheng
Xu, Xiwei
2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2015, : 81 - 87
[38] Estimating runtime of a job in Hadoop MapReduce
Peyravi, Narges
Moeini, Ali
JOURNAL OF BIG DATA, 2020, 7 (01)
[39] A COMPARATIVE REVIEW OF JOB SCHEDULING FOR MAPREDUCE
Yoo, Dongjin
Sim, Kwang Mong
2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 353 - 358
[40] PerfXplain: Debugging MapReduce Job Performance
Khoussainova, Nodira
Balazinska, Magdalena
Suciu, Dan
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07): : 598 - 609

← 1 2 3 4 5 →