Reducing Fault-tolerant Overhead for Distributed Stream Processing with Approximate Backup

被引:1
|
作者
Zhuang, Yuan [1 ]
Wei, Xiaohui [1 ,2 ]
Li, Hongliang [1 ,2 ]
Hou, Mingkai [1 ]
Wang, Yundi [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China
[2] Key Lab Symbol Computat & Knowledge Engn Minist E, Changchun, Peoples R China
基金
中国国家自然科学基金;
关键词
PERFORMANCE; FAILURES;
D O I
10.1109/icccn49398.2020.9209717
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The stream processing model continuously processes online data in an on-pass fashion that can be more vulnerable to failures than other offline-data processing schemes. Checkpoint-based fault-tolerant methods have been widely used to enhance the reliability of stream processing systems. To ensure exact data recoveries upon failures, full-backup mechanisms are used to store a complete copy of data, which introduces substantial runtime overhead and increases output latency. In the meantime, a wide range of online processing applications prefer quick-and-dirty results with a slight degradation inaccuracy to delayed exact results. This paper introduces a novel approximate fault-tolerant problem (OAFP) with the objective of reducing the failure-free fault-tolerant overhead and ensuring user-defiled output accuracy requirement upon failure at the same time. We present an approximate fault-tolerant scheme based on sampling backup mechanism and study the trade-off between fault-tolerant overhead and output accuracy in stream processing systems. We proposed two algorithms to compute backup plans for both single-node failure and correlated failure scenarios. Extensive experiments with different types of stream topologies are conducted on our simulator to verify the correctness and effectiveness of our approach. We prove our solution guarantees the output accuracy requirement with minimum FT latency for directed acyclic graph (DAG) stream topologies with single-node failures.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Fault-tolerant distributed stream processing system
    Gorawski, Marcin
    Marks, Pawel
    [J]. SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 395 - +
  • [2] Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
    Brito, Andrey
    Fetzer, Christof
    Felber, Pascal
    [J]. 2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, : 173 - +
  • [3] Ares: a High Performance and Fault-tolerant Distributed Stream Processing System
    Lin, Changfu
    Zhan, Jingjing
    Chen, Hanhua
    Tan, Jie
    Jin, Hai
    [J]. 2018 IEEE 26TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP), 2018, : 176 - 186
  • [4] Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems
    Xiao, Fuyuan
    Kitasuka, Teruaki
    Aritsugi, Masayoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (04): : 1062 - 1073
  • [5] Fault-tolerant Stream Processing using a Distributed, Replicated File System
    Kwon, YongChul
    Balazinska, Magdalena
    Greenberg, Albert
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 574 - 585
  • [6] Reducing the Overhead of Message Logging in Fault-Tolerant HPC Applications
    Meneses, Esteban
    [J]. HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 204 - 218
  • [7] Approximate Fault-Tolerant Data Stream Aggregation for Edge Computing
    Takao, Daiki
    Sugiura, Kento
    Ishikawa, Yoshiharu
    [J]. BIG-DATA-ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2021, 2022, 13167 : 233 - 244
  • [8] MillWheel: Fault-Tolerant Stream Processing at Internet Scale
    Akidau, Tyler
    Balikov, Alex
    Bekiroglu, Kaya
    Chernyak, Slava
    Haberman, Josh
    Lax, Reuven
    McVeety, Sam
    Mills, Daniel
    Nordstrom, Paul
    Whittle, Sam
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1033 - 1044
  • [9] Reducing the Complexity of Fault-Tolerant System amenable to Approximate Computing
    Zhu, Zhiqi
    Schafer, Benjamin Carrion
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [10] Fault tolerant optimization of active backup for Flink stream processing framework
    Liu, Guang-Xuan
    Huang, Shan
    Hu, Jia-Li
    Duan, Xiao-Dong
    [J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (02): : 297 - 305