GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling

被引:6
|
作者
Bellavista, Paolo [3 ]
Cinque, Marcello [1 ]
Corradi, Antonio [3 ]
Foschini, Luca [3 ]
Frattini, Flavio [1 ,2 ]
Povedano-Molina, Javier [4 ]
机构
[1] Univ Naples Federico II, Dipartimento Ingn Elettr &Tecnol Informaz, Naples, Italy
[2] RisLab, Lab Ric & Innovaz Sicurezza, Naples, Italy
[3] Univ Bologna, Dipartimento Informat Sci & Ingn, Bologna, Italy
[4] Real Time Innovat, Granada, Spain
关键词
Grid; Monitoring; Dependability; Scalability; Scheduling; Fault tolerance; DDS;
D O I
10.1016/j.future.2016.10.023
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographfcally distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:192 / 201
页数:10
相关论文
共 50 条
  • [1] DAGMap: efficient and dependable scheduling of DAG workflow job in Grid
    Haijun Cao
    Hai Jin
    Xiaoxin Wu
    Song Wu
    Xuanhua Shi
    [J]. The Journal of Supercomputing, 2010, 51 : 201 - 223
  • [2] DAGMap: efficient and dependable scheduling of DAG workflow job in Grid
    Cao, Haijun
    Jin, Hai
    Wu, Xiaoxin
    Wu, Song
    Shi, Xuanhua
    [J]. JOURNAL OF SUPERCOMPUTING, 2010, 51 (02): : 201 - 223
  • [3] An enhanced grid scheduling with job priority and equitable interval job distribution
    Lee, HyoYoung
    Lee, DongWoo
    Ramakrishna, R. S.
    [J]. ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2006, 3947 : 53 - 62
  • [4] Dependable Grid Workflow Scheduling Based on Resource Availability
    Yongcai Tao
    Hai Jin
    Song Wu
    Xuanhua Shi
    Lei Shi
    [J]. Journal of Grid Computing, 2013, 11 : 47 - 61
  • [5] Dependable Grid Workflow Scheduling Based on Resource Availability
    Tao, Yongcai
    Jin, Hai
    Wu, Song
    Shi, Xuanhua
    Shi, Lei
    [J]. JOURNAL OF GRID COMPUTING, 2013, 11 (01) : 47 - 61
  • [6] A scalable security architecture for grid
    Zhou, Q
    Yang, G
    Shen, JA
    Rong, CM
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 89 - 93
  • [7] Job Scheduling in a Grid Cluster
    Skenteridou, Kyriaki
    Karatza, Helen D.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2015,
  • [8] An Enhanced Adaptive Scoring Job Scheduling Algorithm for Minimizing Job Failure in Heterogeneous Grid Network
    Aparnaa, S. K.
    Kousalya, K.
    [J]. 2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [9] An Enhanced Mechanism for Balanced Job Scheduling Based on Deadline Control in Computational Grid
    Naik, K. Jairam
    Jagan, A.
    Satyanarayana, N.
    [J]. EMERGING TRENDS IN ELECTRICAL, COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2017, 394 : 3 - 17
  • [10] An Enhanced Round-Robin-Based Job Scheduling Algorithm in Grid Computing
    Sahu, Turendar
    Verma, Sandeep Kumar
    Shakya, Mohit
    Pandey, Raksha
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 : 799 - 807