A Dynamically Adjusting Gracefully Degrading Link-Level Fault-Tolerant Mechanism for NoCs

被引:19
|
作者
Vitkovskiy, Arseniy [1 ]
Soteriou, Vassos [1 ]
Nicopoulos, Chrysostomos [2 ]
机构
[1] Cyprus Univ Technol, Dept Elect & Comp Engn & Informat, CY-3603 Lemesos, Cyprus
[2] Univ Cyprus, Dept Elect & Comp Engn, CY-1678 Nicosia, Cyprus
关键词
Fault-tolerance; networks-on-chip (NoCs); on-chip interconnection networks; router microarchitecture; routing algorithm; ROUTING ALGORITHM; NETWORK; PERFORMANCE;
D O I
10.1109/TCAD.2012.2188801
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid scaling of silicon technology has enabled massive transistor integration densities. Nanometer feature sizes, however, are marred by increasing variability and susceptibility to wear-out. Billion-transistor designs, such as chip multiprocessors (CMPs), are especially vulnerable to defects. CMPs rely on a network-on-chip for all their communication needs. A single link failure within this on-chip fabric can impede, halt, or even deadlock, intertile communication, which can render the entire chip multiprocessor useless. In this paper, we present a technique capable of handling very large numbers of permanent wire failures that occur in parallel links either at manufacture-time or at runtime (dynamically). As opposed to marking an entire parallel link as faulty, whenever some wires fail, the proposed methodology employs these partially-faulty links (PFLs) to continue the transfer of information-albeit at a gracefully degraded mode-in order to maintain network connectivity. Furthermore, the presented technique can designate PFLs as fully-faulty when several wires fail, by utilizing appropriate routing algorithms that bypass nonoperational links, while still maintaining load-balance in the vicinity of PFLs. The proposed scheme employs architectural support within the on-chip routers to detect link failures and enable reconfiguration at the granularity of individual wires. Hardware synthesis confirms the low-cost nature of the proposed architecture, and full-system simulations using both synthetic network traffic and real workloads demonstrate its efficacy.
引用
收藏
页码:1235 / 1248
页数:14
相关论文
共 50 条
  • [1] A Fine-Grained Link-Level Fault-Tolerant Mechanism for Networks-on-Chip
    Vitkovskiy, Arseniy
    Soteriou, Vassos
    Nicopoulos, Chrysostomos
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2010, : 447 - 454
  • [2] A Scalable and Fault-Tolerant Routing Algorithm for NoCs
    Shi, Zewen
    You, Kaidi
    Ying, Yan
    Huang, Bei
    Zeng, Xiaoyang
    Yu, Zhiyi
    [J]. 2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 165 - 168
  • [3] Fault-tolerant strategy and workspace of the subreflector parallel adjusting mechanism
    Yao, Jiantao
    Han, Bo
    Dou, Yuchao
    Xu, Yundou
    Zhao, Yongsheng
    [J]. PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2019, 233 (18) : 6656 - 6667
  • [4] An Implementation of a Distributed Fault-Tolerant Mechanism for 2D Mesh NoCs
    Marcon, Cesar
    Amory, Alexandre
    Webber, Thais
    Bortolon, Felipe T.
    Volpato, Thomas
    Munareto, Jader
    [J]. RAPID SYSTEM PROTOTYPING: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE (RSP 2013), 2013, : 24 - 29
  • [5] A Highly Resilient Routing Algorithm for Fault-Tolerant NoCs
    Fick, David
    DeOrio, Andrew
    Chen, Gregory
    Bertacco, Valeria
    Sylvester, Dennis
    Blaauw, David
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 21 - 26
  • [6] Efficient Link-level Error Resilience in 3D NoCs
    Pasca, Vladimir
    Rehman, Saif-Ur
    Anghel, Lorena
    Benabdenbi, Mounir
    [J]. 2012 IEEE 15TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS (DDECS), 2012, : 127 - 132
  • [7] Latency Reduction of Fault-Tolerant NoCs by Employing Multiple Paths
    Milfont, Ronaldo T. P.
    Ferreira, Joao M.
    Tavares, Daniel A. B.
    Mota, Rafael G.
    Cortez, Paulo C.
    Silveira, Jarbas A. N.
    Marcon, Cesar
    [J]. 2017 30TH SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2017): CHOP ON SANDS, 2017, : 72 - 78
  • [8] Dynamically fault-tolerant content addressable networks
    Saia, J
    Fiat, A
    Gribble, S
    Karlin, AR
    Saroiu, S
    [J]. PEER-TO-PEER SYSTEMS, 2002, 2429 : 270 - 279
  • [9] A Scalable and Reconfigurable Fault-Tolerant Distributed Routing Algorithm for NoCs
    Shi, Zewen
    Zeng, Xiaoyang
    Yu, Zhiyi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (07): : 1386 - 1397
  • [10] Smart-Flooding: A Novel Scheme for Fault-Tolerant NoCs
    Sanusi, Azeez
    Bayoumi, Magdy A.
    [J]. IEEE INTERNATIONAL SOC CONFERENCE, PROCEEDINGS, 2009, : 259 - 262