Supporting fault-tolerance in streaming grid applications

被引:0
|
作者
Zhu, Qian [1 ]
Chen, Liang [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper considers the problem of supporting and efficiently implementing fault-tolerance for tightly-coupled and pipelined applications, especially streaming applications, in a grid environment. We provide an alternative to basic checkpointing and use the notion of Light-weight Summary Structure(LSS) to enable efficient failure-recovery. The idea behind LSS is that at certain points during the execution of a processing stage, the state of the program can be summarized by a small amount of memory. This allows us to store copies of LSS for enabling failure-recovery, which causes low overhead fault-tolerance. Our work can be viewed as an optimization and adaptation of the idea of application-level checkpointing to a different execution environment, and for a different class of applications. Our implementation and evaluation of LSS based failure-recovery has been in the context of the GATES (Grid-based AdapTive Execution on Streams) middleware. An observation we use for providing very low overhead support for fault-tolerance is that algorithms analyzing data streams are only allowed to take a single pass over data, which means they only perform approximate processing. Therefore, we believe that in supporting fault-tolerant execution for these applications, it is acceptable to not analyze a small number of packets of data during failure-recovery. We show how we perform failure-recovery and also demonstrate how we could use additional buffers to limit data loss during the recovery procedure. We also present an efficient algorithm for allocating a new computation resource for failure-recovery at runtime. We have extensively evaluated our implementation using three stream data processing applications, and shown that the use of LSS allows effective and low-overhead failure-recovery.
引用
收藏
页码:1679 / 1690
页数:12
相关论文
共 50 条
  • [1] Supporting Fault-Tolerance in Streaming Grid Applications
    Zhu, Qian
    Chen, Liang
    Agrawal, Gagan
    [J]. PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, : 156 - 157
  • [2] Supporting fault-tolerance in heterogeneous distributed applications
    Maheshwari, P
    Ouyang, J
    [J]. SIXTH HETEROGENEOUS COMPUTING WORKSHOP (HCW '97), PROCEEDINGS, 1997, : 195 - 207
  • [3] A fault-tolerance mechanism in grid
    Jin, L
    Tong, WQ
    Tang, HQ
    Wang, B
    [J]. INDIN 2003: IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, PROCEEDINGS, 2003, : 457 - 461
  • [4] Persistent fault-tolerance for divide-and-conquer applications on the grid
    Wrzesinska, Gosia
    Oprescu, Ana-Maria
    Kielmann, Thilo
    Bal, Henri
    [J]. EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 425 - +
  • [5] Towards supporting Fault-Tolerance in FPGAs
    Siozios, Kostas
    Soudris, Dimitrios
    Pnevmatikatos, Dionisios
    [J]. IEEE ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2010), 2010, : 446 - 447
  • [6] A new fault-tolerance framework for grid computing
    Derbal, Youcef
    [J]. MULTIAGENT AND GRID SYSTEMS, 2006, 2 (02) : 115 - 133
  • [7] Web services system supporting quality fault-tolerance
    Lee, Y
    Oh, J
    Han, SY
    [J]. International Conference on Next Generation Web Services Practices, 2005, : 452 - 453
  • [8] Computing Graph Spanners in Small Memory: Fault-Tolerance and Streaming
    Ausiello, Giorgio
    Franciosa, Paolo G.
    Italiano, Giuseppe F.
    Ribichini, Andrea
    [J]. COMPUTING AND COMBINATORICS, 2010, 6196 : 160 - +
  • [9] COMPUTING GRAPH SPANNERS IN SMALL MEMORY: FAULT-TOLERANCE AND STREAMING
    Ausiello, Giorgio
    Ribichini, Andrea
    Franciosa, Paolo G.
    Italiano, Giuseppe F.
    [J]. DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2010, 2 (04) : 591 - 605
  • [10] A drug discovery grid environment with fault-tolerance support
    Wang, Yongjian
    Ren, Yinan
    Chen, Ting
    Huang, Yuanqiang
    Yu, Kunqian
    Luan, Zhongzhi
    Jiang, Hualiang
    Qian, Depei
    [J]. Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2009, 43 (12): : 21 - 25