Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors

被引:10
|
作者
Khan, Omer [1 ]
Kundu, Sandip [2 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Lowell, MA 01854 USA
[2] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
关键词
Chip Multiprocessor (CMP); hard error detection; isolation and tolerance; hardware/software codesign; COMPONENTS;
D O I
10.1109/TDSC.2011.19
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the semiconductor industry continues its relentless push for nano-CMOS technologies, long-term device reliability and occurrence of hard errors have emerged as a major concern. Long-term device reliability includes parametric degradation that results in loss of performance as well as hard failures that result in loss of functionality. It has been reported in the ITRS roadmap that effectiveness of traditional burn-in test in product life acceleration is eroding. Thus, to assure sufficient product reliability, fault detection and system reconfiguration must be performed in the field at runtime. Although regular memory structures are protected against hard errors using error-correcting codes, many structures within cores are left unprotected. Several proposed online testing techniques either rely on concurrent testing or periodically check for correctness. These techniques are attractive, but limited due to significant design effort and hardware cost. Furthermore, lack of observability and controllability of microarchitectural states result in long latency, long test sequences, and large storage of golden patterns. In this paper, we propose a low-cost scheme for detecting and debugging hard errors with a fine granularity within cores and keeping the faulty cores functional, with potentially reduced capability and performance. The solution includes both hardware and runtime software based on codesigned virtual machine concept. It has the ability to detect, debug, and isolate hard errors in small noncache array structures, execution units, and combinational logic within cores. Hardware signature registers are used to capture the footprint of execution at the output of functional modules within the cores. A runtime layer of software (microvisor) initiates functional tests concurrently on multiple cores to capture the signature footprints across cores to detect, debug, and isolate hard errors. Results show that using targeted set of functional test sequences, faults can be debugged to a fine-granular level within cores. The hardware cost of the scheme is less than three percent, while the software tasks are performed at a high-level, resulting in a relatively low design effort and cost.
引用
收藏
页码:714 / 727
页数:14
相关论文
共 50 条
  • [1] Hardware/Software Co-design Architecture for Thermal Management of Chip Multiprocessors
    Khan, Omer
    Kundu, Sandip
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 952 - 957
  • [2] Hardware/software codesign of on-chip communication architecture for application-specific multiprocessor system-on-chip
    Zergainoh, Nacer-Eddine
    Baghdadi, Amer
    Jerraya, Ahmed
    [J]. INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2005, 1 (1-2) : 112 - 124
  • [3] Hardware/software codesign
    Theerayod, WT
    Cheung, PYK
    Luk, W
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (03) : 14 - 22
  • [4] HARDWARE SOFTWARE CODESIGN
    BUCHENRIEDER, K
    WOLF, WH
    BORRIELLO, G
    LEE, EA
    CAMPOSANO, R
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 1993, 10 (01): : 83 - 90
  • [5] Layout conscious approach and bus architecture synthesis for hardware/software codesign of systems on chip optimized for speed
    Thepayasuwan, N
    Doboli, A
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2005, 13 (05) : 525 - 538
  • [6] Hardware-software codesign
    Cuomo, A
    De Micheli, G
    Ernst, R
    Fuchs, M
    Gajski, DD
    Jerraya, A
    Sangiovanni-Vincentelli, A
    Sciuto, D
    Vissers, KA
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 2000, 17 (01): : 92 - 99
  • [7] A hybrid hardware/software generated Prefetching Thread mechanism on Chip Multiprocessors
    Rui, Hou
    Zhang, Longbing
    Hu, Weiwu
    [J]. EURO-PAR 2006 PARALLEL PROCESSING, 2006, 4128 : 506 - 516
  • [8] Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors
    Mi, Wei
    Feng, Xiaobing
    Xue, Jingling
    Jia, Yaocang
    [J]. NETWORK AND PARALLEL COMPUTING, 2010, 6289 : 329 - +
  • [9] A FRAMEWORK FOR HARDWARE SOFTWARE CODESIGN
    KUMAR, S
    AYLOR, JH
    JOHNSON, BW
    WULF, WA
    [J]. COMPUTER, 1993, 26 (12) : 39 - 45
  • [10] A decade of hardware/software codesign
    Wolf, W
    [J]. COMPUTER, 2003, 36 (04) : 38 - +