A new fault-tolerance framework for grid computing

被引:7
|
作者
Derbal, Youcef [1 ]
机构
[1] Ryerson Univ, Sch Informat Technol Management, 350 Victoria St, Toronto, ON M5B 2K3, Canada
关键词
Computational grid; fault-tolerance; fault detector; reliability; service request;
D O I
10.3233/MGS-2006-2203
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Fault detection and propagation in a computational grid requires a comprehensive framework that takes in consideration the various grid environmental conditions such as the asynchronous nature of communication and the uncertainty on the disseminated fault information. The paper presents a fault-tolerance framework that provides the necessary models to manage the local faulty behavior associated with the operation of hosted services. The framework includes a quantification mechanism of the fault vulnerability of grid nodes and their hosted services. The resulting measures of fault vulnerability are globally disseminated to enable the synthesis of decentralized fault-tolerant decision making strategies.
引用
收藏
页码:115 / 133
页数:19
相关论文
共 50 条
  • [1] Application-Level Fault-Tolerance Solutions for Grid Computing
    Diaz, Daniel
    Pardo, Xoan C.
    Martin, Maria J.
    Gonzalez, Patricia
    [J]. CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 554 - 559
  • [2] A fault-tolerance mechanism in grid
    Jin, L
    Tong, WQ
    Tang, HQ
    Wang, B
    [J]. INDIN 2003: IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, PROCEEDINGS, 2003, : 457 - 461
  • [3] DDGrid: A Grid Computing Environment with Massive Concurrency and Fault-tolerance Support
    Wang, Yongjian
    Luan, Zhongzhi
    Qian, Depei
    Huang, Yuanqiang
    Chen, Ting
    Han, Biao
    Ren, Yinan
    Yu, Kunqian
    Jiang, Hualiang
    [J]. GCC 2008: SEVENTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2008, : 5 - +
  • [4] A Fault-Tolerance Shim for Serverless Computing
    Sreekanti, Vikram
    Wu, Chenggang
    Chhatrapati, Saurav
    Gonzalez, Joseph E.
    Hellerstein, Joseph M.
    Faleiro, Jose M.
    [J]. PROCEEDINGS OF THE FIFTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS'20), 2020,
  • [5] Fault-Tolerance in the Scope of Cloud Computing
    Rehman, A. U.
    Aguiar, Rui L.
    Barraca, Joao Paulo
    [J]. IEEE ACCESS, 2022, 10 : 63422 - 63441
  • [6] Supporting fault-tolerance in streaming grid applications
    Zhu, Qian
    Chen, Liang
    Agrawal, Gagan
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 1679 - 1690
  • [7] Supporting Fault-Tolerance in Streaming Grid Applications
    Zhu, Qian
    Chen, Liang
    Agrawal, Gagan
    [J]. PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, : 156 - 157
  • [8] Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
    Tangmankhong, Thagorn
    Siripongwutikorn, Peerapon
    Achalakul, Tiranee
    [J]. CHIANG MAI JOURNAL OF SCIENCE, 2017, 44 (02): : 688 - 698
  • [9] Estimation of fault-tolerance of the parallel control computing systems: A new approach
    V. V. Eliseev
    V. V. Ignatushchenko
    I. Yu. Podshivalova
    [J]. Automation and Remote Control, 2007, 68 : 1083 - 1099
  • [10] Estimation of fault-tolerance of the parallel control computing systems: A new approach
    Eliseev, V. V.
    Ignatushchenko, V. V.
    Podshivalova, I. Yu.
    [J]. AUTOMATION AND REMOTE CONTROL, 2007, 68 (06) : 1083 - 1099