VRM: A failure-aware grid resource management system

被引:2
|
作者
Communication and Operating Systems Group, School of Electrical Engineering and Computer Science, Technische Universitaet Berlin, Einsteinufer 17 Sekr. EN6, 10587 Berlin, Germany [1 ]
不详 [2 ]
机构
来源
Int. J. High Perform. Comput. Networking | 2008年 / 4卷 / 215-226期
关键词
Natural resources management - Resource allocation - Maintenance - Grid computing;
D O I
10.1504/IJHPCN.2008.022298
中图分类号
学科分类号
摘要
For resource management in Grid environments, advance reservations turned out to be very useful and hence are supported by a variety of Grid toolkits. However, failure recovery for such systems has not yet received the attention it deserves. In this paper, we address the problem of remapping reservations to other resources, when the originally selected resource fails. Instead of dealing with jobs already running, which usually means checkpointing and migration, our focus is on jobs that are scheduled on the failed resource for a specific future period of time but not started yet. The most critical factor when solving this problem is the estimation of the downtime. We avoid the drawbacks of under- or over-estimating the downtime by a dynamic load-based approach that is evaluated by extensive simulations in a Grid environment and shows superior performance compared to estimation-based approaches. Copyright © 2008, Inderscience Publishers.
引用
收藏
相关论文
共 50 条
  • [1] Failure-Aware Scheduling in Grid considering Weibull Failure Distribution
    Singh, Manjeet
    Garg, Ritu
    [J]. 2013 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2013, : 73 - 78
  • [2] Failure-aware resource provisioning for hybrid Cloud infrastructure
    Javadi, Bahman
    Abawajy, Jemal
    Buyya, Rajkumar
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (10) : 1318 - 1331
  • [3] Failure-Aware Resource Scheduling Policy for Hybrid Cloud
    Zhang Hong
    Zhu Hai
    [J]. 2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 152 - 156
  • [4] Failure-aware resource management for high-availability computing clusters with distributed virtual machines
    Fu, Song
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (04) : 384 - 393
  • [5] Scheduling analysis of failure-aware VM in cloud system
    [J]. Ro, C. (cwro@silla.ac.kr), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (07):
  • [6] Incremental Checkpoint Based Failure-Aware Scheduling Algorithm in Grid Computing
    Singh, Manjeet
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 772 - 778
  • [7] Failure-Aware Kidney Exchange
    Dickerson, John P.
    Procaccia, Ariel D.
    Sandholm, Tuomas
    [J]. MANAGEMENT SCIENCE, 2019, 65 (04) : 1768 - 1791
  • [8] Computation Offloading and Resource Allocation in Failure-Aware Vehicular Edge Computing
    Tang, Chaogang
    Yan, Ge
    Wu, Huaming
    Zhu, Chunsheng
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1877 - 1888
  • [9] Robust and Probabilistic Failure-Aware Placement
    Korupolu, Madhukar
    Rajaraman, Rajmohan
    [J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (01)
  • [10] A failure-aware scheduling strategy in large-scale cluster system
    Wu Linping
    Dan, Meng
    Zhan, Jianfeng
    Lei, Wang
    Tu Bibo
    [J]. SIXTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID: SPANNING THE WORLD AND BEYOND, 2006, : 645 - +