A comprehensive study on fault tolerance in stream processing systems

被引:4
|
作者
Wang, Xiaotong [1 ]
Zhang, Chunxi [1 ]
Fang, Junhua [2 ]
Zhang, Rong [1 ]
Qian, Weining [1 ]
Zhou, Aoying [1 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai 200062, Peoples R China
[2] Soochow Univ, Adv Data Analyt Lab, Suzhou 215006, Peoples R China
关键词
fault tolerance; performance evaluation; stream processing; MODEL;
D O I
10.1007/s11704-020-0248-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] FAULT TOLERANCE AND DIGITAL SYSTEMS
    BENNETTS, RG
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1979, 3 (08) : 365 - 373
  • [42] A Performance Study on Operator-based Stream Processing Systems
    Dayarathna, Miyuru
    Takeno, Souhei
    Suzumura, Toyotaro
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2011, : 79 - 79
  • [43] Increased Fault-Tolerance and Real-time Performance Resiliency for Stream Processing Workloads through Redundancy
    Tran, Geoffrey Phi C.
    Walters, John Paul
    Crago, Stephen P.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2019), 2019, : 51 - 55
  • [44] Low-overhead fault tolerance for high-throughput data processing systems
    Martin, Andre
    Knauth, Thomas
    Creutz, Stephan
    Becker, Diogo
    Weigert, Stefan
    Fetzer, Christof
    Brito, Andrey
    [J]. 31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011), 2011, : 689 - 699
  • [45] Time-lag duplexing - A fault tolerance technique for online transaction processing systems
    Chandra, A
    Bossen, DC
    [J]. PACIFIC RIM INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT SYSTEMS, PROCEEDINGS, 1997, : 202 - 207
  • [46] Providing Fault Tolerance via Complex Event Processing and Machine Learning for IoT Systems
    Power, Alexander
    Kotonya, Gerald
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON THE INTERNET OF THINGS ( IOT 2019), 2019,
  • [47] Complex Patterns of Failure: Fault Tolerance via Complex Event Processing for IoT Systems
    Power, Alexander
    Kotonya, Gerald
    [J]. 2019 INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2019, : 986 - 993
  • [48] Fault-tolerant distributed stream processing system
    Gorawski, Marcin
    Marks, Pawel
    [J]. SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 395 - +
  • [49] A Performance Analysis of Fault Recovery in Stream Processing Frameworks
    van Dongen, Giselle
    Van den Poel, Dirk
    [J]. IEEE ACCESS, 2021, 9 : 93745 - 93763
  • [50] Fault-tolerance in distributed query processing
    Smith, J
    Watson, P
    [J]. 9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 329 - 338