VRM: A failure-aware grid resource management system

被引：2

作者：

Communication and Operating Systems Group, School of Electrical Engineering and Computer Science, Technische Universitaet Berlin, Einsteinufer 17 Sekr. EN6, 10587 Berlin, Germany ^{[1
]}

不详 ^{[2
]}

机构：

来源：

Int. J. High Perform. Comput. Networking | 2008年 / 4卷 / 215-226期

关键词：

Natural resources management - Resource allocation - Maintenance - Grid computing;

D O I：

10.1504/IJHPCN.2008.022298

中图分类号：

学科分类号：

摘要：

For resource management in Grid environments, advance reservations turned out to be very useful and hence are supported by a variety of Grid toolkits. However, failure recovery for such systems has not yet received the attention it deserves. In this paper, we address the problem of remapping reservations to other resources, when the originally selected resource fails. Instead of dealing with jobs already running, which usually means checkpointing and migration, our focus is on jobs that are scheduled on the failed resource for a specific future period of time but not started yet. The most critical factor when solving this problem is the estimation of the downtime. We avoid the drawbacks of under- or over-estimating the downtime by a dynamic load-based approach that is evaluated by extensive simulations in a Grid environment and shows superior performance compared to estimation-based approaches. Copyright © 2008, Inderscience Publishers.

引用

共 50 条

[1] Failure-Aware Scheduling in Grid considering Weibull Failure Distribution
Singh, Manjeet
Garg, Ritu
[J]. 2013 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2013, : 73 - 78
[2] Failure-aware resource provisioning for hybrid Cloud infrastructure
Javadi, Bahman
Abawajy, Jemal
Buyya, Rajkumar
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (10) : 1318 - 1331
[3] Failure-Aware Resource Scheduling Policy for Hybrid Cloud
Zhang Hong
Zhu Hai
[J]. 2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 152 - 156
[4] Failure-aware resource management for high-availability computing clusters with distributed virtual machines
Fu, Song
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (04) : 384 - 393
[5] Scheduling analysis of failure-aware VM in cloud system
[J]. Ro, C. (cwro@silla.ac.kr), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (07):
[6] Incremental Checkpoint Based Failure-Aware Scheduling Algorithm in Grid Computing
Singh, Manjeet
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 772 - 778
[7] Failure-Aware Kidney Exchange
Dickerson, John P.
Procaccia, Ariel D.
Sandholm, Tuomas
[J]. MANAGEMENT SCIENCE, 2019, 65 (04) : 1768 - 1791
[8] Computation Offloading and Resource Allocation in Failure-Aware Vehicular Edge Computing
Tang, Chaogang
Yan, Ge
Wu, Huaming
Zhu, Chunsheng
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1877 - 1888
[9] Robust and Probabilistic Failure-Aware Placement
Korupolu, Madhukar
Rajaraman, Rajmohan
[J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (01)
[10] A failure-aware scheduling strategy in large-scale cluster system
Wu Linping
Dan, Meng
Zhan, Jianfeng
Lei, Wang
Tu Bibo
[J]. SIXTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID: SPANNING THE WORLD AND BEYOND, 2006, : 645 - +

← 1 2 3 4 5 →