Distributed fault management for computational grids

被引:0
|
作者
Affaan, Muhammad [1 ]
Ansari, M. A. [2 ]
机构
[1] Muhammad Ali Jinnah Univ, Islamabad, Pakistan
[2] Fed Urdu Univ Arts, Sci & Tech, Islamabad, Pakistan
关键词
grid environment; fault management; check pointing; single point of failure;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media, are at the risk of failure. Grid environment consists of unreliable resources; therefore, fault tolerant mechanisms can not be ignored. Some scientific jobs require long commitments of grid resources whose failures may not be overlooked. We need a flexible management of these failures by considering the failure of fault manager itself. In this paper we propose the concept of distributed management of failures without engaging the resources for this particular task exclusively. Resources performing the fault management may also participate in serving the long running user jobs. Each sub job of the main user job is inspected by an individual resource. In case of failure inspector resource takes over in place of inspected resource. Contributions of this paper are: elimination of single point of failure and proposed concept's ability to be integrated with variety of grid middleware.
引用
收藏
页码:363 / +
页数:2
相关论文
共 50 条
  • [21] Resource performance management on computational grids.
    San José, O
    Suárez, LM
    Huedo, E
    Montero, RS
    Llorente, IM
    [J]. SECOND INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, PROCEEDINGS, 2003, : 215 - +
  • [22] Distributed verification of occurrence graphs: Investigating the use of computational grids
    Barbosa, Paulo E. S.
    Rodrigues, Cassio L.
    Figueiredo, Jorge C. A.
    Guerrero, Dalton D. S.
    [J]. IECON 2007: 33RD ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-3, CONFERENCE PROCEEDINGS, 2007, : 82 - 87
  • [23] STATE INDEPENDENT RESOURCE MANAGEMENT FOR DISTRIBUTED GRIDS
    Rasooli, Aysan
    Down, Douglas G.
    [J]. ICSOFT 2011: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SOFTWARE AND DATABASE TECHNOLOGIES, VOL 1, 2011, : 131 - 136
  • [24] Virtual Data System on distributed virtual machines in computational grids
    Wang, Lizhe
    von Laszewski, Gregor
    Tao, Jie
    Kunze, Marcel
    [J]. INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2010, 6 (04) : 194 - 204
  • [25] A distributed framework for energy-efficient lightpaths in computational grids
    Tafani, Daniele
    Kantarci, Burak
    Mouftah, Hussein
    McArdle, Conor
    Barry, Liam
    [J]. JOURNAL OF HIGH SPEED NETWORKS, 2013, 19 (01) : 1 - 18
  • [26] Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids
    Zhang, Yang
    Mandal, Anirban
    Koelbel, Charles
    Cooper, Keith
    [J]. CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, 2009, : 244 - +
  • [27] A Novel Fault-tolerant Task Scheduling Algorithm for Computational Grids
    Naik, Jairam K.
    Satyanarayana, N.
    [J]. 2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
  • [28] Coordinated learning to support resource management in Computational Grids
    Lynden, S
    Rana, OF
    [J]. SECOND INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, PROCEEDINGS, 2002, : 81 - 89
  • [29] Bayesian networks based distributed fault diagnosis approach for power grids
    Zhou, Shu
    Wang, Xiaoru
    Qian, Qingquan
    [J]. Dianwang Jishu/Power System Technology, 2010, 34 (09): : 76 - 81
  • [30] Fault Detection and Location in Low Voltage Grids based on Distributed Monitoring
    Silva, Nuno
    Basadre, Francisco
    Rodrigues, Paulo
    Nunes, Mario Serafim
    Grilo, Antonio
    Casaca, Augusto
    Melo, Francisco
    Gaspar, Luis
    [J]. 2016 IEEE INTERNATIONAL ENERGY CONFERENCE (ENERGYCON), 2016,