A comprehensive study on fault tolerance in stream processing systems

被引:4
|
作者
Wang, Xiaotong [1 ]
Zhang, Chunxi [1 ]
Fang, Junhua [2 ]
Zhang, Rong [1 ]
Qian, Weining [1 ]
Zhou, Aoying [1 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China
[2] Soochow Univ, Adv Data Analyt Lab, Suzhou 215006, Peoples R China
关键词
fault tolerance; performance evaluation; stream processing; MODEL;
D O I
10.1007/s11704-020-0248-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Incremental Checkpointing for Fault-Tolerant Stream Processing Systems: A Data Structure Approach
    Lin, Chia-Yu
    Wang, Li-Chun
    Chang, Shu-Ping
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2022, 10 (01) : 124 - 136
  • [32] A Comprehensive Survey on Parallelization and Elasticity in Stream Processing
    Roeger, Henriette
    Mayer, Ruben
    [J]. ACM COMPUTING SURVEYS, 2019, 52 (02)
  • [33] The Study of Network Service Fault Discovery Based On Distributed Stream Processing Technology
    Man Yi
    Qiu Dajun
    [J]. PERVASIVE COMPUTING AND THE NETWORKED WORLD, 2014, 8351 : 453 - +
  • [34] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    [J]. Cluster Computing, 2020, 23 : 555 - 574
  • [35] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [36] FAULT TOLERANCE IN DISTRIBUTED SYSTEMS
    SCHMITTER, E
    [J]. SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1983, 12 (01): : 34 - 37
  • [37] Fault tolerance in decentralized systems
    Randell, B
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2000, E83B (05) : 903 - 907
  • [38] Fault tolerance in mechatronic systems
    Isermann, Rolf
    [J]. FORSCHUNG IM INGENIEURWESEN-ENGINEERING RESEARCH, 2016, 80 (1-2): : 41 - 56
  • [39] FAULT TOLERANCE IN MULTIPROCESSOR SYSTEMS
    BISWAS, NN
    SRINIVAS, S
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1987, 11 : 93 - 110
  • [40] Fault Tolerance in Multiagent Systems
    Christie, Samuel H., V
    Chopra, Amit K.
    [J]. ENGINEERING MULTI-AGENT SYSTEMS (EMAS 2020), 2020, 12589 : 78 - 86