Using Explicit Output Comparisons for Fault Tolerant Scheduling (FTS) on Modern High-Performance Processors

被引:0
|
作者
Gao, Yue [1 ]
Gupta, Sandeep K. [1 ]
Breuer, Melvin A. [1 ]
机构
[1] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
SYSTEMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Soft errors and errors caused by intermittent faults are a major concern for modern processors. In this paper we provide a drastically different approach for fault tolerant scheduling (FTS) of tasks in such processors. Traditionally in FTS, error detection is performed implicitly and concurrently with task execution, and associated overheads are incurred as increases in software run-time or hardware area. However, such embedded error detection (EED) techniques, e.g., watchdog processor assisted control flow checking, only provide approximately 70% error coverage [1, 2]. We propose the idea of utilizing straightforward explicit output comparison (EOC) which provides nearly 100% error coverage. We construct a framework for utilizing EOC in FTS, identify new challenges and tradeoffs, and develop a new off-line scheduling algorithm for EOC. We show that our EOC based approach provides higher error coverage and an average performance improvement of nearly 10% over EED-based FTS approaches, without increasing resource requirements. In our ongoing research we are identifying a richer set of ways of applying EOC, by itself and in conjunction with EED, to obtain further improvements.
引用
收藏
页码:927 / 932
页数:6
相关论文
共 50 条
  • [21] DESIGNING HIGH-PERFORMANCE PROCESSORS USING REAL ADDRESS PREDICTION
    HUA, KA
    LIU, LS
    PEIR, JK
    IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (09) : 1146 - 1151
  • [23] High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors
    Wong, Henry
    Betz, Vaughn
    Rose, Jonathan
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (01)
  • [24] B-TREE - A HIGH-PERFORMANCE FAULT-TOLERANT ATM SWITCH
    LI, JJ
    WENG, CM
    IEE PROCEEDINGS-COMMUNICATIONS, 1994, 141 (01): : 20 - 28
  • [25] Scalable, fault-tolerant job step management for high-performance systems
    Solt, D.
    Hursey, J.
    Lauria, A.
    Guo, D.
    Guo, X.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2020, 64 (3-4) : 3 - 4
  • [26] TOFF-2: A high-performance fault-tolerant file service
    Chin, CC
    Tsai, SR
    JOURNAL OF SYSTEMS AND SOFTWARE, 2000, 53 (02) : 173 - 182
  • [27] Cluster delegation: High-performance, fault-tolerant data sharing in NFS
    Batsakis, A
    Burns, R
    14TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2005, : 100 - 109
  • [28] Using Graphics Processors for High-Performance Computation and Visualization of Plasma Turbulence
    Stantchev, George
    Juba, Derek
    Dorland, William
    Varshney, Amitabh
    COMPUTING IN SCIENCE & ENGINEERING, 2009, 11 (02) : 52 - 59
  • [29] Fault-Tolerant Hardware Acceleration for High-Performance Edge-Computing Nodes
    Barbirotta, Marcello
    Cheikh, Abdallah
    Mastrandrea, Antonio
    Menichelli, Francesco
    Angioli, Marco
    Jamili, Saeid
    Olivieri, Mauro
    ELECTRONICS, 2023, 12 (17)
  • [30] WaveCube: A Scalable, Fault-Tolerant, High-Performance Optical Data Center Architecture
    Chen, Kai
    Wen, Xitao
    Ma, Xingyu
    Chen, Yan
    Xia, Yong
    Hu, Chengchen
    Dong, Qunfeng
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,