CREDIBLE EXECUTION OF BOUNDED-TIME PARALLEL SYSTEMS WITH DELAYED DIAGNOSIS

被引:0
|
作者
SHANKAR, R [1 ]
MIRANKER, DP [1 ]
机构
[1] UNIV TEXAS,DEPT COMP SCI,AUSTIN,TX 78712
关键词
FAULT-TOLERANCE; MULTICOMPUTER; PARALLEL PROCESSING; REAL-TIME EXPERT-SYSTEM; DECISION-SYSTEM;
D O I
10.1007/BF02241704
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a forward recovery method for the fault-tolerant execution of parallel software systems on multicomputers such that faults are neither detected nor diagnosed until the fault prevents progress in the computation of the system. The method minimizes the communication and synchronization overhead required to verify the reliability of the system and consequently minimizes the impact of fault-tolerance on the throughput of the computation. We say the system is credible provided that the system is diagnosable and complete, where complete means that at least one copy of each process exists on a fault-free processor. We apply the method to the process structure deriving from parallel, bounded-time decision systems and show through an exact Markov analysis that the method will yield a very credible system. We then introduce a much simpler but approximate Markov model that facilitates credibility analysis over a larger range of parameters and applications.1
引用
收藏
页码:21 / 37
页数:17
相关论文
共 50 条
  • [1] Bounded-time recovery for distributed real-time systems
    Gandhi, Neeraj
    Roth, Edo
    Gifford, Robert
    Linh Thi Xuan Phan
    Haeberlen, Andreas
    [J]. 2020 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2020), 2020, : 110 - 123
  • [2] REBOUND: Defending Distributed Systems Against Attacks with Bounded-Time Recovery
    Gandhi, Neeraj
    Roth, Edo
    Sandler, Brian
    Haeberlen, Andreas
    Phan, Linh Thi Xuan
    [J]. PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 523 - 539
  • [3] A Bounded-Time Service Composition Algorithm for Distributed Real-Time Systems
    Garcia-Valls, M.
    Castro-Fernandez, R.
    Estevez-Ayres, I.
    Basanta-Val, P.
    Rodriguez-Lopez, I.
    [J]. 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1413 - 1420
  • [4] Synthesizing bounded-time 2-phase fault recovery
    Bonakdarpour, Borzoo
    Kulkarni, Sandeep S.
    [J]. FORMAL ASPECTS OF COMPUTING, 2015, 27 (01) : 1 - 31
  • [5] Masking faults while providing bounded-time phased recovery
    Bonakdarpour, Borzoo
    Kulkarni, Sandeep S.
    [J]. FM 2008: FORMAL METHODS, PROCEEDINGS, 2008, 5014 : 374 - +
  • [6] Bounded-Time System Identification under Neuro-Sliding Training
    Garcia-Rodriguez, Rodolfo
    Zegers, Pablo
    Parra-Vega, Vicente
    [J]. IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 2275 - +
  • [7] On the Complexity of Synthesizing Relaxed and Graceful Bounded-Time 2-Phase Recovery
    Bonakdarpour, Borzoo
    Kulkarni, Sandeep S.
    [J]. FM 2009: FORMAL METHODS, PROCEEDINGS, 2009, 5850 : 660 - +
  • [8] THE EXECUTION MODEL AND THE ARCHITECTURE FOR REAL-TIME PARALLEL SYSTEMS
    YAMAGUCHI, Y
    TODA, K
    NISHIDA, K
    TAKAHASHI, E
    [J]. INFORMATION PROCESSING '94, VOL I: TECHNOLOGY AND FOUNDATIONS, 1994, 51 : 177 - 182
  • [9] Programming and Execution Models for Parallel Bounded Exhaustive Testing
    Al Awar, Nader
    Jain, Kush
    Rossbach, Christopher J.
    Gligoric, Milos
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2021, 5
  • [10] Scalable and Bounded-time Decisions on Edge Device Network using Eclipse Zenoh
    Shih, Chi-Sheng
    Lin, Hsiang-Jui
    Yuan, Yuyuan
    Kuo, Yi-Hung
    Liang, Wen-Yew
    [J]. 2022 IEEE 28TH INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA 2022), 2022, : 170 - 179