A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors

被引:0
|
作者
Pan, Abhisek [1 ]
Rodrigues, Rance [2 ]
Kundu, Sandip [3 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
关键词
Chip multiprocessor; fault tolerance through redundancy; hardware reliability; modeling and simulation of multicore systems; FLOATING-POINT UNIT; LOW-COST; DESIGN; MICROARCHITECTURE;
D O I
10.1145/2629688
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Device reliability and manufacturability have emerged as dominant concerns in end-of-road CMOS devices. An increasing number of hardware failures are attributed to manufacturability or reliability problems. Maintaining an acceptable manufacturing yield for chips containing tens of billions of transistors with wide variations in device parameters has been identified as a great challenge. Additionally, today's nanometer scale devices suffer from accelerated aging effects because of the extreme operating temperature and electric fields they are subjected to. Unless addressed in design, aging-related defects can significantly reduce the lifetime of a product. In this article, we investigate a micro-architectural scheme for improving yield and reliability of homogeneous chip multiprocessors (CMPs). The proposed solution involves a hardware framework that enables us to utilize the redundancies inherent in a multicore system to keep the system operational in the face of partial failures. A micro-architectural modification allows a faulty core in a CMP to use another core's resources to service any instruction that the former cannot execute correctly by itself. This service improves yield and reliability but may cause loss of performance. The target platform for quantitative evaluation of performance under degradation is a dual-core and a quad-core chip multiprocessor with one or more cores sustaining partial failure. Simulation studies indicate that when a large, high-latency, and sparingly used unit such as a floating-point unit fails in a core, correct execution may be sustained through outsourcing with at most a 16% impact on performance for a floating-point intensive application. For applications with moderate floating-point load, the degradation is insignificant. The performance impact may be mitigated even further by judicious selection of the cores to commandeer depending on the current load on each of the candidate cores. The area overhead is also negligible due to resource reuse.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Improving Yield and Reliability of Chip Multiprocessors
    Pan, Abhisek
    Khan, Omer
    Kundu, Sandip
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 490 - 495
  • [2] Compiler Directed Network-on-Chip Reliability Enhancement for Chip Multiprocessors
    Ozturk, Ozcan
    Kandemir, Mahmut
    Irwin, Mary J.
    Narayanan, H. K.
    [J]. LCTES 10-PROCEEDINGS OF THE ACM SIGPLAN/SIGBED 2010 CONFERENCE ON LANGUAGES, COMPILERS, & TOOLS FOR EMBEDDED SYSTEMS, 2010, : 85 - 94
  • [3] Compiler Directed Network-on-Chip Reliability Enhancement for Chip Multiprocessors
    Ozturk, Ozcan
    Kandemir, Mahmut
    Irwin, Mary J.
    Narayanan, H. K.
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (04) : 85 - 94
  • [4] Future execution: A hardware prefetching technique for chip multiprocessors
    Ganusov, I
    Burtscher, M
    [J]. PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2005, : 350 - 360
  • [5] Dynamic Lifetime Reliability Management for Chip Multiprocessors
    Moghaddam, Milad Ghorbani
    Ababei, Cristinel
    [J]. IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (04): : 952 - 958
  • [6] Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors
    Feng, Shuguang
    Gupta, Shantanu
    Ansari, Amin
    Mahlke, Scott
    [J]. HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 186 - 200
  • [7] A Yield and Reliability Enhancement Framework for Image Processing Applications
    Hsieh, Tong-Yu
    Ku, Chia-Chi
    Yeh, Chia-Hung
    [J]. 2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2012, : 683 - 686
  • [8] Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors
    Khan, Omer
    Kundu, Sandip
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2011, 8 (05) : 714 - 727
  • [9] Reliability-aware core partitioning in chip multiprocessors
    Oz, Isil
    Topcuoglu, Haluk Rahmi
    Kandemir, Mahmut
    Tosun, Oguz
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2012, 58 (3-4) : 160 - 176
  • [10] Dynamic Energy and Reliability Management in Network-on-Chip based Chip Multiprocessors
    Moghaddam, Milad Ghorbani
    [J]. 2017 EIGHTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2017,