A software-implemented fault injection methodology for design and validation of system fault tolerance

被引:9
|
作者
Some, RR [1 ]
Kim, WS [1 ]
Khanoyan, G [1 ]
Callum, L [1 ]
Agrawal, A [1 ]
Beahan, JJ [1 ]
机构
[1] CALTECH, Jet Prop Lab, Pasadena, CA 91109 USA
关键词
D O I
10.1109/DSN.2001.941435
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present our experience in developing a methodology and tool at the Jet Propulsion Laboratory (JPL) for Software-Implemented Fault Injection (SWIFI) into a parallel processing supercomputer, which is being designed for use in next generation space exploration missions. The fault injector uses software-based strategies to emulate the effects of radiation-induced transients occurring in the system hardware components. The JPL's SWIFI tool set called JIFI (JPL's Implementation of a Fault Injector) is being used, in conjunction with an appropriate system fault model, to evaluate candidate hardware and software fault tolerance architectures, determine the sensitivity of applications to faults and measure the effectiveness of fault detection, isolation, and recovery strategies. JIFI has been validated to inject faults into user-specified CPU registers and memory regions with a uniform random distribution in location and time. Together with verifiers, classifiers, and run scripts, JIFI enables massive fault injection campaigns and statistical data analysis.
引用
收藏
页码:501 / 506
页数:4
相关论文
共 50 条
  • [1] A new approach to software-implemented fault tolerance
    Rebaudengo, M
    Reorda, MS
    Violante, M
    [J]. JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS, 2004, 20 (04): : 433 - 437
  • [2] A New Approach to Software-Implemented Fault Tolerance
    M. Rebaudengo
    M. Sonza Reorda
    M. Violante
    [J]. Journal of Electronic Testing, 2004, 20 : 433 - 437
  • [3] THE SOFTWARE-IMPLEMENTED FAULT TOLERANCE (SIFT) APPROACH TO FAULT TOLERANT COMPUTING
    GOLDBERG, J
    [J]. PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1981, 298 : 289 - 293
  • [4] The recovery language approach for software-implemented fault tolerance
    De Florio, V
    Deconinck, C
    Lauwereins, R
    [J]. NINTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 2001, : 418 - 425
  • [5] FAIL*: An Open and Versatile Fault-Injection Framework for the Assessment of Software-Implemented Hardware Fault Tolerance
    Schirmeier, Horst
    Hoffmann, Martin
    Dietrich, Christian
    Lenz, Michael
    Lohmann, Daniel
    Spinczyk, Olaf
    [J]. 2015 ELEVENTH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC), 2015, : 245 - 255
  • [6] Comparison of physical and software-implemented fault injection techniques
    Arlat, J
    Crouzet, Y
    Karlsson, J
    Folkesson, P
    Fuchs, E
    Leber, GH
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (09) : 1115 - 1133
  • [7] SOFTWARE IMPLEMENTED FAULT TOLERANCE - A METHODOLOGY
    LOMBARDI, F
    RODA, VO
    [J]. MICROELECTRONICS AND RELIABILITY, 1982, 22 (04): : 873 - 886
  • [8] A PERFORMANCE EVALUATION OF THE SOFTWARE-IMPLEMENTED FAULT-TOLERANCE COMPUTER
    PALUMBO, DL
    BUTLER, RW
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1986, 9 (02) : 175 - 180
  • [9] Software-implemented Fault Injection in Operating System Kernel Mutex Data Structure
    Montrucchio, Bartolomeo
    Rebaudengo, Maurizio
    Velasco, Alejandro David
    [J]. 2014 IEEE 5TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS), 2014,
  • [10] Non-intrusive software-implemented fault injection in embedded systems
    Yuste, P
    Ruiz, JC
    Lemus, L
    Gil, P
    [J]. DEPENDABLE COMPUTING, 2003, 2847 : 23 - 38