Experiences with software-based soft-error mitigation using AN codes

被引:0
|
作者
Hoffmann, Martin [1 ]
Ulbrich, Peter [1 ]
Dietrich, Christian [1 ]
Schirmeier, Horst [2 ]
Lohmann, Daniel [1 ]
Schroeder-Preikschat, Wolfgang [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Distributed Syst & Operating Syst, D-91058 Erlangen, Germany
[2] Tech Univ Dortmund, Dept Comp Sci 12, D-44221 Dortmund, Germany
关键词
Fault injection; Arithmetic code; Dependability; FAULT;
D O I
10.1007/s11219-014-9260-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Arithmetic error coding schemes are a well-known and effective technique for soft-error mitigation. Although the underlying coding theory is generally a complex area of mathematics, its practical implementation is comparatively simple in general. However, compliance with the theory can be lost easily while moving toward an actual implementation, which finally jeopardizes the aspired fault-tolerance characteristics and effectiveness. In this paper, we present our experiences and lessons learned from implementing arithmetic error coding schemes (AN codes) in the context of our Combined Redundancy fault-tolerance approach. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur at every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. This allowed us to eliminate all remaining silent data corruptions in the Combined Redundancy framework, which we validated by an extensive fault-injection campaign covering the entire fault space of 1-bit and 2-bit errors.
引用
收藏
页码:87 / 113
页数:27
相关论文
共 50 条
  • [1] Experiences with software-based soft-error mitigation using AN codes
    Martin Hoffmann
    Peter Ulbrich
    Christian Dietrich
    Horst Schirmeier
    Daniel Lohmann
    Wolfgang Schröder-Preikschat
    [J]. Software Quality Journal, 2016, 24 : 87 - 113
  • [2] A Practitioner's Guide to Software-based Soft-Error Mitigation Using AN-Codes
    Hoffmann, Martin
    Ulbrich, Peter
    Dietrich, Christian
    Schirmeier, Horst
    Lohmann, Daniel
    Schroeder-Preikschat, Wolfgang
    [J]. 2014 IEEE 15TH INTERNATIONAL SYMPOSIUM ON HIGH-ASSURANCE SYSTEMS ENGINEERING (HASE), 2014, : 33 - 40
  • [3] DECO: Optimizing Software-based Soft-Error Detector Configurations
    Thunig, Robin
    Lenz, Michael
    Ulbrich, Peter
    Schirmeier, Horst
    [J]. 2022 18TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2022), 2022, : 73 - 80
  • [4] ReDup: A software-based method for detecting soft-error using data analysis
    Arasteh, Bahman
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2019, 78 : 89 - 107
  • [5] Improving Software-based Techniques for Soft Error Mitigation in OoO Superscalar Processors
    Cardoso, Douglas Maciel
    Tonetto, Rafael Billig
    Brandalero, Marcelo
    Agostini, Luciano
    Nazar, Gabriel L.
    Azambuja, Jose Rodrigo
    Schneider Beck, Antonio Carlos
    [J]. 2019 26TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2019, : 201 - 204
  • [6] Revisiting Software-based Soft Error Mitigation Techniques via Accurate Error Generation and Propagation Models
    Ebrahimi, Mojtaba
    Rashvand, Maryam
    Kaddachi, Firas
    Tahoori, Mehdi B.
    Di Natale, Giorgio
    [J]. 2016 IEEE 22ND INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS), 2016, : 66 - 71
  • [7] Architectural and Micro-architectural Techniques for Software Controlled Microprocessor Soft-error Mitigation
    Gogulamudi, Anudeep R.
    Clark, Lawrence T.
    Farnsworth, Chad
    Chellappa, Srivatsan
    Vashishtha, Vinay
    [J]. 2015 15TH EUROPEAN CONFERENCE ON RADIATION AND ITS EFFECTS ON COMPONENTS AND SYSTEMS (RADECS), 2015,
  • [8] Software-based method for 'Soft Error' correction in space computers
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    不详
    [J]. Yuhang Xuebao, 2007, 4 (1044-1048):
  • [9] Soft-error mitigation by means of decoupled transactional memory threads
    Sanchez, Daniel
    Cebrian, Juan M.
    Garcia, Jose M.
    Aragon, Juan L.
    [J]. DISTRIBUTED COMPUTING, 2015, 28 (02) : 75 - 90
  • [10] Robust C-element design for soft-error mitigation
    Wey, I-Chyn
    Wu, Bing-Chen
    Peng, Chien-Chang
    Gong, Cihun-Siyong Alex
    Yu, Chang-Hong
    [J]. IEICE ELECTRONICS EXPRESS, 2015, 12 (10):