Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

被引:0
|
作者
Su, Li [1 ]
Zhou, Yongluan [1 ]
机构
[1] Univ Southern Denmark, Odense, Denmark
来源
2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE) | 2016年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach.
引用
收藏
页码:517 / 528
页数:12
相关论文
共 50 条
  • [1] Passive and Partially Active Fault Tolerance for Massively Parallel Stream Processing Engines
    Su, Li
    Zhou, Yongluan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (01) : 32 - 45
  • [2] Efficient stream interface topologies for massively-parallel SIMD processing
    Abbo, A. A.
    Choudhary, V. S.
    Kleihorst, R. P.
    Wielage, P.
    Sevat, L.
    ESSCIRC 2006: PROCEEDINGS OF THE 32ND EUROPEAN SOLID-STATE CIRCUITS CONFERENCE, 2006, : 158 - +
  • [4] Massively parallel pattern recognition with link failures
    Kutrib, M
    Löwe, JT
    SOFSEM 2000: THEORY AND PRACTICE OF INFORMATICS, 2000, 1963 : 392 - 401
  • [5] Subtleties in tolerating correlated failures in wide-area storage system
    Nath, Suman
    Yu, Haifeng
    Gibbons, Phillip B.
    Seshan, Srinivasan
    USENIX ASSOCIATION PROCEEDINGS OF THE 3RD SYMPOSIUM ON NETWORKED SYSTEMS DESIGN & IMPLEMENTATION (NSDI 06), 2006, : 225 - 238
  • [6] Tolerating Correlated Failures for Generalized Cartesian Distributions via Bipartite Matching
    Ali, Nawab
    Krishnamoorthy, Sriram
    Halappanavar, Mahantesh
    Daily, Jeff
    PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
  • [7] Massively parallel femtosecond laser processing
    Hasegawa, Satoshi
    Ito, Haruyasu
    Toyoda, Haruyoshi
    Hayasaki, Yoshio
    OPTICS EXPRESS, 2016, 24 (16): : 18513 - 18524
  • [8] Multiwavelength parallel optical interconnects for massively parallel processing
    Patel, RR
    Bond, SW
    Pocha, MD
    Larson, MC
    Garrett, HE
    Drayton, RF
    Petersen, HE
    Krol, DM
    Deri, RJ
    Lowry, ME
    IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2003, 9 (02) : 657 - 666
  • [9] Tolerating Temporal Correlated Failures from Cyclic Dependency in High Performance Computing Systems
    Chen, Xin
    He, Xubin
    PROCEEDINGS OF THE 2008 14TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2008, : 509 - 516
  • [10] Data Dissemination and Parallel Processing Techniques Research Based on Massively Parallel Processing
    Sun, Qiao
    Deng, Bu-qiao
    Nie, Xiab-Bo
    Ma, Hui-yuan
    Sun, Jia-song
    INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATION AND NETWORK ENGINEERING (WCNE 2016), 2016,