Tolerating Soft Errors in Processor Cores Using CLEAR (Cross-Layer Exploration for Architecting Resilience)

被引:12
|
作者
Cheng, Eric [1 ]
Mirkhani, Shahrzad [1 ]
Szafaryn, Lukasz G. [2 ]
Cher, Chen-Yong [3 ]
Cho, Hyungmin [4 ]
Skadron, Kevin [2 ]
Stan, Mircea R. [2 ]
Lilja, Klas [5 ]
Abraham, Jacob A. [6 ]
Bose, Pradip [3 ]
Mitra, Subhasish [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Virginia, Charlottesville, VA 22904 USA
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[4] Hongik Univ, Seoul 04066, South Korea
[5] Robust Chip Inc, Pleasanton, CA 94588 USA
[6] Univ Texas Austin, Austin, TX 78705 USA
关键词
Cross-layer resilience; soft errors; FAULT-TOLERANCE; DESIGN; PERFORMANCE; HARDWARE; IMPACT;
D O I
10.1109/TCAD.2017.2752705
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present cross-layer exploration for architecting resilience, a first of its kind framework which overcomes a major challenge in the design of digital systems that are resilient to reliability failures: achieve desired resilience targets at minimal costs (energy, power, execution time, and area) by combining resilience techniques across various layers of the system stack (circuit, logic, architecture, software, and algorithm). This is also referred to as cross-layer resilience. In this paper, we focus on radiation-induced soft errors in processor cores. We address both single-event upsets and single-event multiple upsets in terrestrial environments. Our framework automatically and systematically explores the large space of comprehensive resilience techniques and their combinations across various layers of the system stack (586 cross-layer combinations in this paper), derives cost-effective solutions that achieve resilience targets at minimal costs, and provides guidelines for the design of new resilience techniques. Our results demonstrate that a carefully optimized combination of circuit-level hardening, logic-level parity checking, and micro-architectural recovery provides a highly cost-effective soft error resilience solution for general-purpose processor cores. For example, a 50x improvement in silent data corruption (SDC) rate is achieved at only 2.1% energy cost for an out-of-order core (6.1% for an in-order core) with no speed impact. However, (application-aware) selective circuit-level hardening alone, guided by a thorough analysis of the effects of soft errors on application benchmarks, provides a cost-effective soft error resilience solution as well (with similar to 1% additional energy cost for a 50x improvement in SDC rate).
引用
收藏
页码:1839 / 1852
页数:14
相关论文
共 12 条
  • [1] CLEAR: Cross-Layer Exploration for Architecting Resilience Combining Hardware and Software Techniques to Tolerate Soft Errors in Processor Cores
    Cheng, Eric
    Mirkhani, Shahrzad
    Szafaryn, Lukasz G.
    Cher, Chen-Yong
    Cho, Hyungmin
    Skadron, Kevin
    Stan, Mircea R.
    Lilja, Klas
    Abraham, Jacob A.
    Bose, Pradip
    Mitra, Subhasish
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [2] Cross-Layer Exploration of Heterogeneous Multicore Processor Configurations
    Sarma, Santanu
    Dutt, Nikil
    2015 28TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID), 2015, : 147 - 152
  • [3] Effect of Soft Errors in Iterative Learning Control and Compensation using Cross-layer Approach
    Jeong, G-M
    Lee, K.
    Choi, S-, I
    Ji, S-H
    Dutt, N.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2019, 14 (03) : 359 - 374
  • [4] Robust Face Recognition Against Soft-errors Using a Cross-layer Approach
    Jeong, G. -M.
    Park, C. -W.
    Choi, S. -I.
    Lee, K.
    Dutt, N.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2016, 11 (05) : 657 - 666
  • [5] SyRA: Early System Reliability Analysis for Cross-Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems
    Vallero, A.
    Savino, A.
    Chatzidimitriou, A.
    Kaliorakis, M.
    Kooli, M.
    Riera, M.
    Anglada, M.
    Di Natale, G.
    Bosio, A.
    Canal, R.
    Gonzalez, A.
    Gizopoulos, D.
    Mariani, R.
    Di Carlo, S.
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (05) : 765 - 783
  • [6] Cross-Layer Soft-Error Resilience Analysis of Computing Systems
    Bosio, Alberto
    Canal, Ramon
    Di Carlo, Stefano
    Gizopoulos, Dimitris
    Savino, Alessandro
    2020 50TH ANNUAL IEEE-IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS-SUPPLEMENTAL VOLUME (DSN-S), 2020, : 79 - 79
  • [7] Cross-layer Resilience Using Wearout Aware Design Flow
    Zandian, Bardia
    Annavaram, Murali
    2011 IEEE/IFIP 41ST INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2011, : 279 - 290
  • [8] A CROSS-LAYER TECHNOLOGY-BASED STUDY OF HOW MEMORY ERRORS IMPACT SYSTEM RESILIENCE
    Kleeberger, Veit B.
    Gimmler-Dumont, Christina
    Weis, Christian
    Herkersdorf, Andreas
    Mueller-Gritschneder, Daniel
    Nassif, Sani R.
    Schlichtmann, Ulf
    Wehn, Norbert
    IEEE MICRO, 2013, 33 (04) : 46 - 55
  • [9] Cross-Layer Co-Exploration of Exploiting Error Resilience for Video Over Wireless Applications
    Khajeh, Amin
    Kim, Minyoung
    Dutt, Nikil
    Eltawil, Ahmed M.
    Kurdahi, Fadi J.
    PROCEEDINGS OF THE 2008 IEEE/ACM/IFIP WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2008, : 13 - 18
  • [10] CLEAR: A Cross-Layer Soft Error Rate Reduction Method Based on Mitigating DETs in Nanoscale Combinational Logics
    Hajisadeghi, Amir M.
    Zarandi, Hamid R.
    MICROPROCESSORS AND MICROSYSTEMS, 2021, 85