Finding missing synchronization in a distributed computation using controlled re-execution

被引:1
|
作者
Mittal, N [1 ]
Garg, VK
机构
[1] Univ Texas, Dept Comp Sci, Richardson, TX 75083 USA
[2] Univ Texas, Dept Elect & Comp Engn, Austin, TX 78712 USA
关键词
distributed system; debugging; software-fault tolerance; controlled re-execution; predicate control;
D O I
10.1007/s00446-003-0104-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Correct distributed programs are hard to write. Not surprisingly, distributed systems are especially vulnerable to software faults. Testing and debugging is an important way to improve the reliability of distributed systems. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem of predicate control, which takes a computation and a safety property specified on the computation as inputs, and produces a controlled computation, with added synchronization, that maintains the given safety property as output. We devise efficient control algorithms for two classes of useful predicates, namely region predicates and disjunctive predicates. For the former, we prove that the control algorithm is optimal in the sense that it guarantees maximum concurrency possible in the controlled computation. For the latter, we prove that our control algorithm generates the least number of synchronization dependencies and therefore has optimal message-complexity. Furthermore, we provide a necessary and sufficient condition under which it is possible to efficiently compute a minimal controlling synchronization for a general predicate. We also give an algorithm to compute such a synchronization under the condition provided.
引用
收藏
页码:107 / 130
页数:24
相关论文
共 9 条
  • [1] Finding missing synchronization in a distributed computation using controlled re-execution
    Neeraj Mittal
    Vijay K. Garg
    Distributed Computing, 2004, 17 : 107 - 130
  • [2] Software fault tolerance of concurrent programs using controlled re-execution
    Tarafdar, A
    Garg, VK
    DISTRIBUTED COMPUTING, 1999, 1693 : 210 - 224
  • [3] Re-execution of distributed programs to detect bugs hidden by racing messages
    Kilgore, R
    Chase, C
    THIRTIETH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL 1: SOFTWARE TECHNOLOGY AND ARCHITECTURE, 1997, : 423 - 432
  • [4] Fine-Grained Re-Execution for Efficient Batched Commit of Distributed Transactions
    Dong, Zhiyuan
    Wang, Zhaoguo
    Zhang, Xiaodong
    Xu, Xian
    Zhao, Changgeng
    Chen, Haibo
    Panda, Aurojit
    Li, Jinyang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (08): : 1930 - 1943
  • [5] Using instruction result locality and re-execution to mitigate silent data corruptions
    Tajary, Alireza
    Zarandi, Hamid R.
    MICROELECTRONICS RELIABILITY, 2016, 62 : 178 - 190
  • [6] ReSlice: Selective re-execution of long-retired misspeculated instructions using forward slicing
    Sarangi, SR
    Liu, W
    Torrellas, J
    Zhou, YY
    MICRO-38: Proceedings of the 38th Annual IEEE/ACM International Symposiumn on Microarchitecture, 2005, : 257 - 268
  • [7] Benefits of Bayesian adaptive trial designs: A virtual re-execution using breast cancer trial data
    Hong, Wei
    McLachlan, Sue-Anne
    Moore, Melissa
    Mahar, Robert
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2021, 17 : 29 - 30
  • [8] Efficient re-localization of mobile robot using strategy of finding a missing person
    Meng, Jie
    Wang, Shuting
    Xie, Yuanlong
    Jiang, Liquan
    Li, Gen
    Liu, Chao
    MEASUREMENT, 2021, 176
  • [9] Providing end-to-end QoS in distributed computation using non-greedy task synchronization
    Tolstikov, A
    Biswas, J
    Tham, CK
    2004 12TH IEEE INTERNATIONAL CONFERENCE ON NETWORKS, VOLS 1 AND 2 , PROCEEDINGS: UNITY IN DIVERSITY, 2004, : 397 - 402