Distributed fault management for computational grids

被引:0
|
作者
Affaan, Muhammad [1 ]
Ansari, M. A. [2 ]
机构
[1] Muhammad Ali Jinnah Univ, Islamabad, Pakistan
[2] Fed Urdu Univ Arts, Sci & Tech, Islamabad, Pakistan
关键词
grid environment; fault management; check pointing; single point of failure;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media, are at the risk of failure. Grid environment consists of unreliable resources; therefore, fault tolerant mechanisms can not be ignored. Some scientific jobs require long commitments of grid resources whose failures may not be overlooked. We need a flexible management of these failures by considering the failure of fault manager itself. In this paper we propose the concept of distributed management of failures without engaging the resources for this particular task exclusively. Resources performing the fault management may also participate in serving the long running user jobs. Each sub job of the main user job is inspected by an individual resource. In case of failure inspector resource takes over in place of inspected resource. Contributions of this paper are: elimination of single point of failure and proposed concept's ability to be integrated with variety of grid middleware.
引用
收藏
页码:363 / +
页数:2
相关论文
共 50 条
  • [1] Fault Tolerant Resource Management Scheme for Computational Grids
    Kumar, Anuj
    Pathak, Heman
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 472 - 481
  • [2] Distributed Management of Energy-Efficient Lightpaths for Computational Grids
    Tafani, Daniele
    Kantarci, Burak
    Mouftah, Hussein T.
    McArdle, Conor
    Barry, Liam P.
    [J]. 2012 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2012, : 2924 - 2929
  • [3] Distributed asymmetric verification in computational grids
    Kuhn, Michael
    Schmid, Stefan
    Wattenhofer, Roger
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 1103 - 1112
  • [4] Design of distributed component frameworks for computational grids
    Govindaraju, M
    Bari, H
    Lewis, MJ
    [J]. CIC '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN COMPUTING, 2004, : 160 - 166
  • [5] A prototype of distributed molecular visualization on computational grids
    Zhu, HB
    Chun, TKY
    Wang, LZ
    Cai, WT
    See, S
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2004, 20 (05): : 727 - 737
  • [6] Framework for Distributed Evolutionary Algorithms in Computational Grids
    Limmer, Steffen
    Fey, Dietmar
    [J]. ADVANCES IN COMPUTATION AND INTELLIGENCE, 2010, 6382 : 170 - 180
  • [7] Fault Current Management Using Inverter-Based Distributed Generators in Smart Grids
    Rajaei, Nazila
    Ahmed, Mohammed Hassan
    Salama, M. M. A.
    Varma, Rajiv K.
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2014, 5 (05) : 2183 - 2193
  • [8] Fault Current Management Using Inverter-Based Distributed Generators in Smart Grids
    Rajaei, Nazila
    Ahmed, Mohamed
    Salama, Magdy
    Varma, Rajiv
    [J]. 2015 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, 2015,
  • [9] An infrastructure for monitoring and management in computational grids
    Waheed, A
    Smith, W
    George, J
    Yan, J
    [J]. LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 235 - 245
  • [10] A system for monitoring and management of computational grids
    Smith, W
    [J]. 2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDING, 2002, : 55 - 62