A Scheduling Strategy to Run Hadoop Jobs on Geodistributed Data

被引：5

作者：

Cavallo, Marco ^{[1
]}

Cusma, Lorenzo ^{[1
]}

Di Modica, Giuseppe ^{[1
]}

Polito, Carmelo ^{[1
]}

Tomarchio, Orazio ^{[1
]}

机构：

[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy

来源：

ADVANCES IN SERVICE-ORIENTED AND CLOUD COMPUTING (ESOCC 2015) | 2016年 / 567卷

关键词：

D O I：

10.1007/978-3-319-33313-7_1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Internet-of-Things scenarios will be typically characterized by huge amounts of data made available. A challenging task is to efficiently manage such data, by analyzing, elaborating and extracting useful information from them. Distributed computing framework such as Hadoop, based on the MapReduce paradigm, have been used to process such amounts of data by exploiting the computing power of many cluster nodes. However, as long as the computing context is made of clusters of homogeneous nodes interconnected through high speed links, the benefit brought by the such frameworks is clear and tangible. Unfortunately, in many real big data applications the data to be processed reside in many computationally heterogeneous data centers distributed over the planet. In those contexts, Hadoop was proved to perform very poorly. The proposal presented in this paper addresses this limitation. We designed a context-aware Hadoop framework that is capable of scheduling and distributing tasks among geographically distant clusters in a way that minimizes overall jobs' execution time. The proposed scheduler leverages on the integer partitioning technique and on an a-priori knowledge of big data application patterns to explore the space of all possible task schedules and estimate the one expected to perform best. Final experiments conducted on a scheduler prototype prove the benefit of the approach.

引用

页码：5 / 19

页数：15

共 50 条

[1] PCSP: A Preemptive Capacity Scheduler Policy for Scheduling Hadoop Jobs
Xue Shengjun
Wang Delong
Shi Suhong
[J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (05): : 33 - 45
[2] A robust scheduling strategy for moldable scheduling of parallel jobs
Srinivasan, S
Krishnamoorthy, S
Sadayappan, P
[J]. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 92 - 99
[3] A Data Streams Analysis Strategy Based on Hadoop Scheduling Optimization for Smart Grid Application
Zhou, Fengquan
Song, Xin
Han, Yinghua
Gao, Jing
[J]. FRONTIERS IN ALGORITHMICS (FAW 2015), 2015, 9130 : 326 - 333
[4] Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
Kurazumi, Shiori
Tsumura, Tomoaki
Saito, Shoichi
Matsuo, Hiroshi
[J]. 2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 288 - 292
[5] An improved data placement strategy for hadoop
Lin, Wei-Wei
[J]. Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2012, 40 (01): : 152 - 158
[6] Energy-Efficient Task Scheduling for CPU-Intensive Streaming Jobs on Hadoop
Jin, Peiquan
Hao, Xingjun
Wang, Xiaoliang
Yue, Lihua
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (06) : 1298 - 1311
[7] New Data Placement Strategy in the HADOOP Framework
Elomari, Akram
Hassouni, Larbi
Maizate, Abderrahim
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 676 - 684
[8] Scheduling in the Presence of Data Intensive Compute Jobs
Rehrouzi-Far, Amir
Soljanin, Emilia
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5989 - 5991
[9] Jobs Run-Time Scheduling in a Java']Java Based Grid Architecture
Guaragnella, Cataldo
Guerriero, Andrea
Pasquale, Ciriaco C.
Ragni, Francesco
[J]. EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, 5754 : 453 - 463
[10] A Real-time Scheduling Strategy Based on Processing Framework of Hadoop
Chen, Fangbing
Liu, Ji
Zhu, Yuesheng
[J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 321 - 328

← 1 2 3 4 5 →