Dynamic Fault Tolerance in Fat Trees

被引:22
|
作者
Sem-Jacobsen, Frank Olaf [1 ]
Skeie, Tor [1 ,2 ]
Lysne, Olav [1 ,2 ]
Duato, Jose [1 ,3 ]
机构
[1] Simula Res Lab, N-1325 Lysaker, Norway
[2] Univ Oslo, N-0316 Oslo, Norway
[3] Univ Politecn Valencia, Dept Informat Sistemas & Comp, Valencia 46022, Spain
关键词
Fat trees; k-ary n-trees; dynamic fault tolerance; deterministic routing; adaptive routing;
D O I
10.1109/TC.2010.97
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fat trees are a very common communication architecture in current large-scale parallel computers. The probability of failure in these systems increases with the number of components. We present a routing method for deterministically and adaptively routed fat trees, applicable to both distributed and source routing, that is able to handle several concurrent faults and that transparently returns to the original routing strategy once the faulty components have recovered. The method is local and dynamic, completely masking the fault from the rest of the system. It only requires a small extra functionality in the switches to handle rerouting packets around a fault. The method guarantees connectedness and deadlock and livelock freedom for up to k - 1 benign simultaneous switch and/or link faults where k is half the number of ports in the switches. Our simulation experiments show a graceful degradation of performance as more faults occur. Furthermore, we demonstrate that for most fault combinations, our method will even be able to handle significantly more faults beyond the k - 1 limit with high probability.
引用
收藏
页码:508 / 525
页数:18
相关论文
共 50 条
  • [11] Dynamic Practical Byzantine Fault Tolerance
    Xu Hao
    Long Yu
    Liu Zhiqiang
    Liu Zhen
    Gu Dawu
    2018 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2018,
  • [12] On dynamic fault tolerance for WSI networks
    Yamada, T
    Nishimura, T
    Ueno, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1997, E80A (08) : 1529 - 1530
  • [13] Time Parallel Simulation for Dynamic Fault Trees
    Dao Thi, T. H.
    Fourneau, J. M.
    Pekergin, N.
    Quessette, F.
    INFORMATION SCIENCES AND SYSTEMS 2014, 2014, : 337 - 344
  • [14] A new approach to solve dynamic fault trees
    Amari, S
    Dill, G
    Howald, E
    ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2003 PROCEEDINGS, 2003, : 374 - 379
  • [15] Rare event simulation for dynamic fault trees
    Ruijters, Enno
    Reijsbergen, Daniel
    de Boer, Pieter-Tjerk
    Stoelinga, Marielle
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2019, 186 : 220 - 231
  • [16] Towards a sound semantics for dynamic fault trees
    Rauzy, Antoine
    Bleriot-Fabre, Chaire
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2015, 142 : 184 - 191
  • [17] Is Cut Sequence Necessary in Dynamic Fault Trees?
    Xiang, Jianwen
    Machida, Fumio
    Tadano, Kumiko
    Hosono, Shigeru
    2014 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2014, : 138 - 139
  • [18] Rare Event Simulation for Dynamic Fault Trees
    Ruijters, Enno
    Reijsbergen, Daniel
    de Boer, Pieter-Tjerk
    Stoelinga, Marielle
    COMPUTER SAFETY, RELIABILITY, AND SECURITY, SAFECOMP 2017, 2017, 10488 : 20 - 35
  • [19] Scalable Analysis of Fault Trees with Dynamic Features
    Krcal, Jan
    Krcal, Pavel
    2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, 2015, : 89 - 100
  • [20] Dynamic reliability block diagrams VS dynamic fault trees
    Distefano, Salvatore
    Puliafito, Antonio
    ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2007 PROCEEDINGS, 2006, : 71 - +