Distributed Hardware Matcher Framework for SoC Survivability

被引:0
|
作者
Wagner, Ilya [1 ]
Lu, Shih-Lien [2 ]
机构
[1] Intel Corp, Platform Validat Engn, Santa Clara, CA 95054 USA
[2] Intel Corp, Oregon Microarchitecture Lab, Santa Clara, CA 95051 USA
来源
2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE) | 2011年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern systems on chip (SoCs) are rapidly becoming complex high-performance computational devices, featuring multiple general purpose processor cores and a variety of functional IP blocks, communicating with each other through on-die fabric. While modular SoC design provides power savings and simplifies the development process, it also leaves significant room for a special type of hardware bugs, interaction errors, to slip through pre-and post-silicon verification. Consequently, hard to fix silicon escapes may be discovered late in production schedule or even after a market release, potentially causing costly delays or recalls. In this work we propose a unified error detection and recovery framework that incorporates programmable features into the on-die fabric of an SoC, so triggers of escaped interaction bugs can be detected at runtime. Furthermore, upon detection, our solution locks the interface of an IP for a programmed time period, thus altering interactions between accesses and bypassing the bug in a manner transparent to software. For classes of errors that cannot be circumvented by this in-hardware technique our framework is programmed to propagate the error detection to the software layer. Our experiments demonstrate that the proposed framework is capable of detecting a range of interaction errors with less than 0.01% performance penalty and 0.45% area overhead.
引用
收藏
页码:305 / 310
页数:6
相关论文
共 50 条
  • [11] A framework of software rejuvenation for survivability
    Aung, KMM
    Park, JS
    18TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 2 (REGULAR PAPERS), PROCEEDINGS, 2004, : 507 - 510
  • [12] FRAMEWORK FOR NETWORK SURVIVABILITY PERFORMANCE
    ZOLFAGHARI, A
    KAUDEL, FJ
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1994, 12 (01) : 46 - 51
  • [13] A hardware design of SoC verification platform
    Xi, YB
    Zhan, HQ
    Gu, J
    ICEMI 2005: CONFERENCE PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL 7, 2005, : 541 - 546
  • [14] Introduction to hardware abstraction layers for SoC
    Yoo, S
    Jerraya, AA
    EMBEDDED SOFTWARE FOR SOC, 2003, : 179 - 186
  • [15] Hardware fault free simulation for SOC
    Hahanov, V. I.
    Kaminska, M. O.
    Ghribi, W.
    Hahanova, A. V.
    MIXDES 2007: Proceedings of the 14th International Conference on Mixed Design of Integrated Circuits and Systems:, 2007, : 424 - 428
  • [16] Introduction to hardware abstraction layers for SoC
    Yoo, S
    Jerraya, AA
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 336 - 337
  • [17] Survivability of distributed fault detection systems
    Zhou L.
    Lv H.
    Liu K.
    Zhang J.
    International Journal of Performability Engineering, 2019, 15 (11) : 3008 - 3015
  • [18] Scheduling of distributed tasks for survivability of the application
    Chabridon, S
    Gelenbe, E
    INFORMATION SCIENCES, 1997, 97 (1-2) : 179 - 198
  • [19] Security and survivability of distributed systems: An overview
    Kyamakya, K
    Jobmann, K
    Meincke, M
    MILCOM 2000: 21ST CENTURY MILITARY COMMUNICATIONS CONFERENCE PROCEEDINGS, VOLS 1 AND 2: ARCHITECTURES & TECHNOLOGIES FOR INFORMATION SUPERIORITY, 2000, : 449 - 454
  • [20] DISTRIBUTED SOFTWARE FOR DISTRIBUTED HARDWARE
    MOKHOFF, N
    IEEE SPECTRUM, 1979, 16 (10) : 57 - 57