A replication-based fault tolerance protocol using group communication for the Grid

被引:0
|
作者
Erciyes, Kayhan [1 ]
机构
[1] Izmir Inst Technol, TR-35430 Izmir, Turkey
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We describe a replication-based protocol that uses group communication for fault tolerance in the Computational Grid. The Grid is partitioned into a number of clusters and each cluster has a designated coordinator that manages the states of the replicas within its cluster. The coordinators belong to a process group and the proposed protocol ensures the correct sequence of message deliveries to the replicas by the coordinators. Any failing node of the Grid is replaced by an active replica to provide correct continuation of the operation of the application. We show the theoretical framework along with illustrations of the replication protocol and its implementation results and analyze its performance and scalability.
引用
收藏
页码:672 / 681
页数:10
相关论文
共 50 条
  • [1] Replication-Based Fault Tolerance for MPI Applications
    Walters, John Paul
    Chaudhary, Vipin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (07) : 997 - 1010
  • [2] A Replication-Based Mechanism for Fault Tolerance in MapReduce Framework
    Liu, Yang
    Wei, Wei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [3] Replication-Based Fault-Tolerance for Large-Scale Graph Processing
    Chen, Rong
    Yao, Youyang
    Wang, Peng
    Zhang, Kaiyuan
    Wang, Zhaoguo
    Guan, Haibing
    Zang, Binyu
    Chen, Haibo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (07) : 1621 - 1635
  • [4] Replication-based Fault-tolerance for Large-scale Graph Processing
    Wang, Peng
    Zhang, Kaiyuan
    Chen, Rong
    Chen, Haibo
    Guan, Haibing
    2014 44TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2014, : 562 - 573
  • [5] Extension of the Ocarina Tool Suite to Support Reliable Replication-Based Fault-Tolerance
    Gabsi, Wafa
    Zalila, Bechir
    Jmaiel, Mohamed
    RELIABLE SOFTWARE TECHNOLOGIES - ADA-EUROPE 2016, 2016, 9695 : 129 - 144
  • [6] Agent fault tolerance using group communication
    Mishra, S
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 383 - 389
  • [7] A scalable asynchronous replication-based strategy for fault tolerant MPI applications
    Walters, John Paul
    Chaudhary, Vipin
    HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 257 - 268
  • [8] A Replication Strategy for Fault Tolerance in Data Grid Environment
    Li, Jing
    ACC 2009: ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2009, : 363 - 366
  • [9] Group-based Scheduling Algorithm for Fault Tolerance in Mobile Grid
    Lee, JongHyuk
    Choi, SungJin
    Suh, Taeweon
    Yu, HeonChang
    Gil, JoonMin
    SECURITY-ENRICHED URBAN COMPUTING AND SMART GRID, 2010, 78 : 394 - +
  • [10] Interactive group object replication fault tolerance for CORBA
    Modzelewski, BE
    Cyganski, D
    Underwood, M
    PROCEEDINGS OF THE THIRD USENIX CONFERENCE ON OBJECT-ORIENTED TECHNOLOGIES AND SYSTEMS (COOTS), 1997, : 241 - 244