GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling

被引：6

作者：

Bellavista, Paolo ^{[3
]}

Cinque, Marcello ^{[1
]}

Corradi, Antonio ^{[3
]}

Foschini, Luca ^{[3
]}

Frattini, Flavio ^{[1
,2
]}

Povedano-Molina, Javier ^{[4
]}

机构：

[1] Univ Naples Federico II, Dipartimento Ingn Elettr &Tecnol Informaz, Naples, Italy

[2] RisLab, Lab Ric & Innovaz Sicurezza, Naples, Italy

[3] Univ Bologna, Dipartimento Informat Sci & Ingn, Bologna, Italy

[4] Real Time Innovat, Granada, Spain

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2017年 / 71卷

关键词：

Grid; Monitoring; Dependability; Scalability; Scheduling; Fault tolerance; DDS;

D O I：

10.1016/j.future.2016.10.023

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographfcally distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：192 / 201

页数：10

共 50 条

[1] DAGMap: efficient and dependable scheduling of DAG workflow job in Grid
Haijun Cao
Hai Jin
Xiaoxin Wu
Song Wu
Xuanhua Shi
[J]. The Journal of Supercomputing, 2010, 51 : 201 - 223
[2] DAGMap: efficient and dependable scheduling of DAG workflow job in Grid
Cao, Haijun
Jin, Hai
Wu, Xiaoxin
Wu, Song
Shi, Xuanhua
[J]. JOURNAL OF SUPERCOMPUTING, 2010, 51 (02): : 201 - 223
[3] An enhanced grid scheduling with job priority and equitable interval job distribution
Lee, HyoYoung
Lee, DongWoo
Ramakrishna, R. S.
[J]. ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2006, 3947 : 53 - 62
[4] Dependable Grid Workflow Scheduling Based on Resource Availability
Yongcai Tao
Hai Jin
Song Wu
Xuanhua Shi
Lei Shi
[J]. Journal of Grid Computing, 2013, 11 : 47 - 61
[5] Dependable Grid Workflow Scheduling Based on Resource Availability
Tao, Yongcai
Jin, Hai
Wu, Song
Shi, Xuanhua
Shi, Lei
[J]. JOURNAL OF GRID COMPUTING, 2013, 11 (01) : 47 - 61
[6] A scalable security architecture for grid
Zhou, Q
Yang, G
Shen, JA
Rong, CM
[J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 89 - 93
[7] Job Scheduling in a Grid Cluster
Skenteridou, Kyriaki
Karatza, Helen D.
[J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2015,
[8] An Enhanced Adaptive Scoring Job Scheduling Algorithm for Minimizing Job Failure in Heterogeneous Grid Network
Aparnaa, S. K.
Kousalya, K.
[J]. 2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
[9] An Enhanced Mechanism for Balanced Job Scheduling Based on Deadline Control in Computational Grid
Naik, K. Jairam
Jagan, A.
Satyanarayana, N.
[J]. EMERGING TRENDS IN ELECTRICAL, COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2017, 394 : 3 - 17
[10] An Enhanced Round-Robin-Based Job Scheduling Algorithm in Grid Computing
Sahu, Turendar
Verma, Sandeep Kumar
Shakya, Mohit
Pandey, Raksha
[J]. INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 : 799 - 807

← 1 2 3 4 5 →