A comprehensive study on fault tolerance in stream processing systems

被引:4
|
作者
Wang, Xiaotong [1 ]
Zhang, Chunxi [1 ]
Fang, Junhua [2 ]
Zhang, Rong [1 ]
Qian, Weining [1 ]
Zhou, Aoying [1 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China
[2] Soochow Univ, Adv Data Analyt Lab, Suzhou 215006, Peoples R China
关键词
fault tolerance; performance evaluation; stream processing; MODEL;
D O I
10.1007/s11704-020-0248-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs
    Geldenhuys, Morgan K.
    Thamsen, Lauritz
    Kao, Odej
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 434 - 440
  • [22] Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
    Brito, Andrey
    Fetzer, Christof
    Felber, Pascal
    [J]. 2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, : 173 - +
  • [23] On Converter Fault Tolerance in MMC-HVDC Systems: A Comprehensive Survey
    Matos Farias, Joao Victor
    Cupertino, Allan Fagner
    Pereira, Heverton Augusto
    Seleme, Seleme Isaac, Jr.
    Teodorescu, Remus
    [J]. IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2021, 9 (06) : 7459 - 7470
  • [24] Comprehensive and Systematic Study on the Fault Tolerance Architectures in Cloud Computing
    Mohammadian, Vahid
    Navimipour, Nima Jafari
    Hosseinzadeh, Mehdi
    Darwesh, Aso
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2020, 29 (15)
  • [25] Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems
    Xiao, Fuyuan
    Kitasuka, Teruaki
    Aritsugi, Masayoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (04): : 1062 - 1073
  • [26] Fault tolerance in web systems: a systematic mapping study
    Feres Carvalho, Dárlinton Barbosa
    da Costa Paiva, Sofia Larissa
    Gonçalves, Franciane Pereira
    Corrêa, Fábio
    [J]. International Journal of Web Engineering and Technology, 2021, 16 (04) : 355 - 371
  • [27] Tolerance to geometrical inaccuracies in CBCT systems: A comprehensive study
    Abella, Monica
    Martinez, Cristobal
    Garcia, Ines
    Moreno, Patricia
    De Molina, Claudia
    Desco, Manuel
    [J]. MEDICAL PHYSICS, 2021, 48 (10) : 6007 - 6019
  • [28] Fault tolerance in big data storage and processing systems: A review on challenges and solutions
    Saadoon, Muntadher
    Ab Hamid, Siti Hafizah
    Sofian, Hazrina
    Altarturi, Hamza H. M.
    Azizul, Zati Hakim
    Nasuha, Nur
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2022, 13 (02)
  • [29] Efficient Fault-tolerance for Iterative Graph Processing on Distributed Dataflow Systems
    Xu, Chen
    Holzemer, Markus
    Kaul, Manohar
    Markl, Volker
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 613 - 624
  • [30] ALGEBRAIC TECHNIQUES FOR ALGORITHM BASED FAULT TOLERANCE IN SIGNAL-PROCESSING SYSTEMS
    LIN, KY
    KRISHNA, H
    WANG, YB
    [J]. TWENTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2: CONFERENCE RECORD, 1989, : 648 - 652