Experiences with software-based soft-error mitigation using AN codes

被引:0
|
作者
Hoffmann, Martin [1 ]
Ulbrich, Peter [1 ]
Dietrich, Christian [1 ]
Schirmeier, Horst [2 ]
Lohmann, Daniel [1 ]
Schroeder-Preikschat, Wolfgang [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Distributed Syst & Operating Syst, D-91058 Erlangen, Germany
[2] Tech Univ Dortmund, Dept Comp Sci 12, D-44221 Dortmund, Germany
关键词
Fault injection; Arithmetic code; Dependability; FAULT;
D O I
10.1007/s11219-014-9260-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Arithmetic error coding schemes are a well-known and effective technique for soft-error mitigation. Although the underlying coding theory is generally a complex area of mathematics, its practical implementation is comparatively simple in general. However, compliance with the theory can be lost easily while moving toward an actual implementation, which finally jeopardizes the aspired fault-tolerance characteristics and effectiveness. In this paper, we present our experiences and lessons learned from implementing arithmetic error coding schemes (AN codes) in the context of our Combined Redundancy fault-tolerance approach. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur at every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. This allowed us to eliminate all remaining silent data corruptions in the Combined Redundancy framework, which we validated by an extensive fault-injection campaign covering the entire fault space of 1-bit and 2-bit errors.
引用
收藏
页码:87 / 113
页数:27
相关论文
共 50 条
  • [21] Soft-Error Characterization and Mitigation Strategies for Edge Tensor Processing Units in Space
    Garrett, Tyler
    Roffe, Seth
    George, Alan
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 5481 - 5498
  • [22] Hardware-Software Collaborated Method for Soft-Error Tolerant MPSoC
    Liu, Weichen
    Xu, Jiang
    Wang, Xuan
    Wang, Yu
    Zhang, Wei
    Ye, Yaoyao
    Wu, Xiaowen
    Nikdast, Mahdi
    Wang, Zhehui
    2011 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2011, : 260 - 265
  • [23] Soft-error detection using control flow assertions
    Goloubeva, O
    Rebaudengo, M
    Reorda, MS
    Violante, M
    18TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2003, : 581 - 588
  • [24] An architecture for software-based iSCSI: Experiences and analyses
    Foong, A
    McAlpine, G
    Minturn, D
    Regnier, G
    Saletore, V
    NETWORKING 2005: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS SYSTEMS, 2005, 3462 : 65 - 77
  • [25] Software-based erasure codes for scalable distributed storage
    Cooley, JA
    Mineweaser, JL
    Servi, LD
    Tsung, ET
    20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 157 - 164
  • [26] Software-Based Mitigation for Memory Address Decoder Aging
    Kraak, D. H. P.
    Gursoy, C. C.
    Agbo, I. O.
    Taouil, M.
    Jenihhin, M.
    Raik, J.
    Hamdioui, S.
    2019 20TH IEEE LATIN AMERICAN TEST SYMPOSIUM (LATS), 2019,
  • [27] Efficient Software-Based Encoding and Decoding of BCH Codes
    Cho, Junho
    Sung, Wonyong
    IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (07) : 878 - 889
  • [28] Reducing Soft-error Vulnerability of Caches using Data Compression
    Mittal, Sparsh
    Vetter, Jeffrey S.
    2016 INTERNATIONAL GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI), 2016, : 197 - 202
  • [29] Soft-Error Detection in Register Files using Circular Scan
    Schat, Jan
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2017), 2017,
  • [30] Software-based Control Flow Error Detection and Correction Using Branch Triplication
    Ghalaty, Nahid Farhady
    Fazeli, Mahdi
    Rad, Hossein Izadi
    Miremadi, Seyed Ghassem
    2011 IEEE 17TH INTERNATIONAL ON-LINE TESTING SYMPOSIUM (IOLTS), 2011,