Recomputation and correction mechanism design for tagged instructions of the RISC-V core

被引:0
|
作者
Deng D. [1 ]
Guo Y. [1 ]
机构
[1] College of Computer Science and Technology, National University of Defense Technology, Changsha
关键词
Correction; Humming bird; Recomputation; RISC-V; Tagged instruction;
D O I
10.11887/j.cn.202006011
中图分类号
学科分类号
摘要
The reliability of the computer system is significantly compromised by the hardware transient faults which are mainly caused by the cosmic radiation and other environmental factors. To mitigate this undesirable impact and guarantee the correctness of the running programs, a recomputation and correction mechanism for tagged instructions for an open source core named "Humming bird e203", which is based on the RISC-V instruction set architecture, was proposed. This mechanism adds extra flag bits for each instruction and thus enables flexible recomputation for any tagged instruction at low hardware cost. Besides, it can issue the tagged instruction again automatically if the result of the first recomputation is different from the original one. This majority voting scheme can efficiently rectify most data flow errors caused by transient hardware faults. The experimental results show that with our proposal and the interrupt handler, the average probability at which programs can operate correctly can be increased by 86.67% under the random transient fault insertion. © 2020, NUDT Press. All right reserved.
引用
收藏
页码:90 / 97
页数:7
相关论文
共 16 条
  • [1] WANG Changhe, The influence with reliability of motional satellite by the single-event phenomena, Semiconductor Information, 35, 1, pp. 1-8, (1998)
  • [2] Sosnowski J., Transient fault tolerance in digital systems, IEEE Micro, 14, 1, pp. 24-35, (1994)
  • [3] Clark J A, Pradhan D K., Fault injection: a method for validating computer-system dependability, Computer, 28, 6, pp. 47-56, (1995)
  • [4] Normand E., Single-event effects in avionics, IEEE Transactions on Nuclear Science, 43, 2, pp. 461-474, (1996)
  • [5] Oh N, Shirvani P P, McCluskey E J., Control-flow checking by software signatures, IEEE Transactions on Reliability, 51, 1, pp. 111-122, (2002)
  • [6] Khosravi F, Farbeh H, Fazeli M, Et al., Low cost concurrent error detection for on-chip memory based embedded processors, Proceedings of IFIP International Conference on Embedded and Ubiquitous Computing, (2011)
  • [7] Du BY, Reorda M S, Sterpone L, Et al., Online test of control flow errors: a new debug interface-based approach, IEEE Transactions on Computers, 65, 6, pp. 1846-1855, (2016)
  • [8] Avizienis A., The N-version approach to fault-tolerant software, IEEE Transactions on Software Engineering, SE-11, 12, pp. 1491-1501, (2006)
  • [9] Oh N, Shirvani P P, McCluskey E J., Error detection by duplicated instructions in super-scalar processors, IEEE Transactions on Reliability, 51, 1, pp. 63-75, (2002)
  • [10] Oh N, Mitra S, McCluskey E J., ED4I: error detection by diverse data and duplicated instructions, IEEE Transactions on Computers, 51, 2, pp. 180-199, (2002)