Experiences with software-based soft-error mitigation using AN codes

被引:0
|
作者
Hoffmann, Martin [1 ]
Ulbrich, Peter [1 ]
Dietrich, Christian [1 ]
Schirmeier, Horst [2 ]
Lohmann, Daniel [1 ]
Schroeder-Preikschat, Wolfgang [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Distributed Syst & Operating Syst, D-91058 Erlangen, Germany
[2] Tech Univ Dortmund, Dept Comp Sci 12, D-44221 Dortmund, Germany
关键词
Fault injection; Arithmetic code; Dependability; FAULT;
D O I
10.1007/s11219-014-9260-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Arithmetic error coding schemes are a well-known and effective technique for soft-error mitigation. Although the underlying coding theory is generally a complex area of mathematics, its practical implementation is comparatively simple in general. However, compliance with the theory can be lost easily while moving toward an actual implementation, which finally jeopardizes the aspired fault-tolerance characteristics and effectiveness. In this paper, we present our experiences and lessons learned from implementing arithmetic error coding schemes (AN codes) in the context of our Combined Redundancy fault-tolerance approach. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur at every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. This allowed us to eliminate all remaining silent data corruptions in the Combined Redundancy framework, which we validated by an extensive fault-injection campaign covering the entire fault space of 1-bit and 2-bit errors.
引用
收藏
页码:87 / 113
页数:27
相关论文
共 50 条
  • [31] Runaway pacemaker due to software-based programming error
    Levert, JV
    Hoorntje, JCA
    PACE-PACING AND CLINICAL ELECTROPHYSIOLOGY, 2004, 27 (12): : 1689 - 1690
  • [32] The Reliability of Software Algorithms and Software-based Mitigation Techniques in Digital Signal Processors
    Quinn, Heather
    Fairbanks, Tom
    Tripp, Justin L.
    Manuzzato, Andrea
    2013 IEEE RADIATION EFFECTS DATA WORKSHOP (REDW), 2013,
  • [33] ORBIT: Effective Issue Queue Soft-error Vulnerability Mitigation on Simultaneous Multithreaded Architectures using Operand Readiness-based Instruction Dispatch
    Fu, Xin
    Li, Tao
    Fortes, Jose
    20th International Symposium on Computer Architecture and High Performance Computing, Proceedings, 2008, : 71 - 78
  • [34] Design space exploration of non-uniform cache access for soft-error vulnerability mitigation
    Maghsoudloo, Mohammad
    Zarandi, Hamid R.
    MICROELECTRONICS RELIABILITY, 2015, 55 (11) : 2439 - 2452
  • [35] Software-based Mitigation of Image Degradation due to atmospheric Turbulence
    Huebner, Claudia S.
    Scheifling, Corinne
    OPTICS IN ATMOSPHERIC PROPAGATION AND ADAPTIVE SYSTEMS XIII, 2010, 7828
  • [36] The EDA TURBO project: software-based atmospheric turbulence mitigation
    Hofmann, Julia
    Goelzer, Rilene
    Gladysz, Szymon
    ENVIRONMENTAL EFFECTS ON LIGHT PROPAGATION AND ADAPTIVE SYSTEMS V, 2022, 12266
  • [37] Cost-Efficient Scheduling in High-Level Synthesis for Soft-Error Vulnerability Mitigation
    Hara-Azumi, Yuko
    Tomiyama, Hiroyuki
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2013), 2013, : 502 - 507
  • [38] Demystifying Soft-Error Mitigation by Control-Flow Checking - A New Perspective on its Effectiveness
    Schuster, Simon
    Ulbrich, Peter
    Stilkerich, Isabella
    Dietrich, Christian
    Schroeder-Preikschat, Wolfgang
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [39] AWAIT: An Ultra-Lightweight Soft-Error Mitigation Mechanism for Network-on-Chip Links
    Janson, Karl
    Pihlak, Rene
    Azad, Siavoosh Payandeh
    Niazmand, Behrad
    Jervan, Gert
    Raik, Jaan
    PROCEEDINGS OF THE 2018 13TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2018,
  • [40] Time redundancy based soft-error tolerance to rescue nanometer technologies
    Nicolaidis, M.
    Proceedings of the IEEE VLSI Test Symposium, 1999, : 86 - 94