A Scheduling Strategy to Run Hadoop Jobs on Geodistributed Data

被引:5
|
作者
Cavallo, Marco [1 ]
Cusma, Lorenzo [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
D O I
10.1007/978-3-319-33313-7_1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Internet-of-Things scenarios will be typically characterized by huge amounts of data made available. A challenging task is to efficiently manage such data, by analyzing, elaborating and extracting useful information from them. Distributed computing framework such as Hadoop, based on the MapReduce paradigm, have been used to process such amounts of data by exploiting the computing power of many cluster nodes. However, as long as the computing context is made of clusters of homogeneous nodes interconnected through high speed links, the benefit brought by the such frameworks is clear and tangible. Unfortunately, in many real big data applications the data to be processed reside in many computationally heterogeneous data centers distributed over the planet. In those contexts, Hadoop was proved to perform very poorly. The proposal presented in this paper addresses this limitation. We designed a context-aware Hadoop framework that is capable of scheduling and distributing tasks among geographically distant clusters in a way that minimizes overall jobs' execution time. The proposed scheduler leverages on the integer partitioning technique and on an a-priori knowledge of big data application patterns to explore the space of all possible task schedules and estimate the one expected to perform best. Final experiments conducted on a scheduler prototype prove the benefit of the approach.
引用
收藏
页码:5 / 19
页数:15
相关论文
共 50 条
  • [1] PCSP: A Preemptive Capacity Scheduler Policy for Scheduling Hadoop Jobs
    Xue Shengjun
    Wang Delong
    Shi Suhong
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (05): : 33 - 45
  • [2] A robust scheduling strategy for moldable scheduling of parallel jobs
    Srinivasan, S
    Krishnamoorthy, S
    Sadayappan, P
    [J]. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 92 - 99
  • [3] A Data Streams Analysis Strategy Based on Hadoop Scheduling Optimization for Smart Grid Application
    Zhou, Fengquan
    Song, Xin
    Han, Yinghua
    Gao, Jing
    [J]. FRONTIERS IN ALGORITHMICS (FAW 2015), 2015, 9130 : 326 - 333
  • [4] Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
    Kurazumi, Shiori
    Tsumura, Tomoaki
    Saito, Shoichi
    Matsuo, Hiroshi
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 288 - 292
  • [5] An improved data placement strategy for hadoop
    Lin, Wei-Wei
    [J]. Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2012, 40 (01): : 152 - 158
  • [6] Energy-Efficient Task Scheduling for CPU-Intensive Streaming Jobs on Hadoop
    Jin, Peiquan
    Hao, Xingjun
    Wang, Xiaoliang
    Yue, Lihua
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (06) : 1298 - 1311
  • [7] New Data Placement Strategy in the HADOOP Framework
    Elomari, Akram
    Hassouni, Larbi
    Maizate, Abderrahim
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 676 - 684
  • [8] Scheduling in the Presence of Data Intensive Compute Jobs
    Rehrouzi-Far, Amir
    Soljanin, Emilia
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5989 - 5991
  • [9] Jobs Run-Time Scheduling in a Java']Java Based Grid Architecture
    Guaragnella, Cataldo
    Guerriero, Andrea
    Pasquale, Ciriaco C.
    Ragni, Francesco
    [J]. EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, 5754 : 453 - 463
  • [10] A Real-time Scheduling Strategy Based on Processing Framework of Hadoop
    Chen, Fangbing
    Liu, Ji
    Zhu, Yuesheng
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 321 - 328