Communication Pattern-based Distributed Snapshots in Large-Scale Systems

被引:1
|
作者
Saker, Salem [1 ]
Agbaria, Adnan [2 ]
机构
[1] Univ Haifa, Acad Arab Coll Educ Israel, IL-31999 Haifa, Israel
[2] Univ Haifa, IL-31999 Haifa, Israel
关键词
ROLLBACK-RECOVERY;
D O I
10.1109/IPDPSW.2015.117
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-Scale systems (LSSs) continue to attract more attention from the scientific community for addressing high-performance computing. Providing fault tolerance in distributed systems is a challenge. This challenge doubtlessly becomes more difficult in LSSs. Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for providing fault tolerance. This paper motivates the need for providing fault tolerance in LSSs and focuses on the limitations behind this provision. It then presents an innovative and scalable distributed snapshots approach for LSSs. In this approach, upon a new snapshot, a process coordinates only with the processes that it has communicated with since the last snapshot. Our protocol improves the Chandy and Lamport distributed snapshot protocol which was presented in 1985. This improvement may enable developers and planners of systems to consider this protocol. We compare the performance of our new approach to the performance of other existing well-known distributed snapshot approaches using stochastic models. The results show that our approach achieves lower overhead with significant improvement.
引用
下载
收藏
页码:1062 / 1071
页数:10
相关论文
共 50 条
  • [1] Pattern-based development of communication systems
    Gotzhein, R
    Schaible, P
    ANNALES DES TELECOMMUNICATIONS-ANNALS OF TELECOMMUNICATIONS, 1999, 54 (11-12): : 508 - 525
  • [2] The power of epidemics: Robust communication for large-scale distributed systems
    Vogels, W
    van Renesse, R
    Birman, K
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2003, 33 (01) : 131 - 135
  • [3] A communication-based distributed model predictive control approach for large-scale systems
    Segovia, P.
    Rajaoarisoa, L.
    Nejjari, F.
    Duviella, E.
    Puig, V.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 8366 - 8371
  • [4] Distributed control for geometric pattern formation of large-scale multirobot systems
    Giusti, Andrea
    Maffettone, Gian Carlo
    Fiore, Davide
    Coraggio, Marco
    di Bernardo, Mario
    FRONTIERS IN ROBOTICS AND AI, 2023, 10
  • [5] Distributed Adaptive Protocols for Information Dissemination in Large-Scale Communication Systems
    Shetty, Sachin
    Song, Min
    Wang, Jun
    THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING (MUE 2009), 2009, : 279 - +
  • [6] Channel State Tracking for Large-Scale Distributed MIMO Communication Systems
    Brown, D. Richard, III
    Wang, Rui
    Dasgupta, Soura
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (10) : 2559 - 2571
  • [7] Detecting Genuine Communities from Large-Scale Social Networks: A Pattern-Based Method
    Wu, Zhiang
    Cao, Jie
    Wu, Junjie
    Wang, Youquan
    Liu, Chunyang
    COMPUTER JOURNAL, 2014, 57 (09): : 1343 - 1357
  • [8] An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience
    Domizzi Sanchez-Gallegos, Dante
    Di Luccio, Diana
    Kosta, Sokol
    Gonzalez-Compean, J. L.
    Montella, Raffaele
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 122 : 187 - 203
  • [9] Detecting genuine communities from large-scale social networks: A pattern-based method
    Liu, C. (lcy@isc.org.cn), 1600, Oxford University Press (57):
  • [10] Observer-based distributed control of large-scale systems under gossip communication protocol
    Yu, Tao
    Yu, Lanlin
    Xiong, Junlin
    ASIAN JOURNAL OF CONTROL, 2022, 24 (02) : 956 - 972