Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery

被引:8
|
作者
Zhu, Xinping [1 ]
Qin, Wei [2 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] Boston Univ, Boston, MA 02215 USA
关键词
performance; design; experimentation; verification; retargetable simulation; network-on-chip; multiprocessor system; fault-tolerance; system-on-chip; run-time verification;
D O I
10.1109/DAC.2006.229177
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modem integrated circuits (ICs) are becoming increasingly complex. The complexity makes it difficult to design, manufacture and integrate these high-performance ICs. The advent of multiprocessor Systems-on-chips (SoCs) makes it even more challenging for programmers to utilize the full potential of the computation resources on the chips. In the mean time, the complexity of the chip design creates new reliability challenges. As a result, chip designers and users cannot fully exploit the tremendous silicon resources on the chip. This research proposes a prototype which is composed of a fault-tolerant multiprocessor SoC and a coupled single program, multiple data (SPMD) programming framework. We use a SystemC based modeling and simulation environment to design and analyze this prototype. Our analysis shows that this prototype as a reliable computing platform constructed from the potentially unreliable chip resources, thus protecting the previous investment of hardware and software designs. Moreover, the promising application-driven simulation results shed light on the potential of a scalable and reliable multiprocessing computing platform for a wide range of mission-critical applications.
引用
收藏
页码:53 / +
页数:2
相关论文
共 50 条
  • [1] DISTRIBUTED RECOVERY IN FAULT-TOLERANT MULTIPROCESSOR NETWORKS
    YANNEY, RM
    HAYES, JP
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1986, 35 (10) : 871 - 879
  • [2] RUN-TIME RESOURCE MANAGEMENT IN FAULT-TOLERANT NETWORK ON RECONFIGURABLE CHIPS
    Hosseinabady, Mohammad
    Nunez-Yanez, Jose L.
    [J]. FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 574 - 577
  • [3] DISTRIBUTED RECOVERY IN FAULT-TOLERANT MULTIPROCESSOR NETWORKS.
    Yanney, Raif M.
    Hayes, John P.
    [J]. IEEE Transactions on Computers, 1986, C-35 (10) : 871 - 879
  • [4] On the Number of Opinions Needed for Fault-Tolerant Run-Time Monitoring in Distributed Systems
    Fraigniaud, Pierre
    Rajsbaum, Sergio
    Travers, Corentin
    [J]. RUNTIME VERIFICATION, RV 2014, 2014, 8734 : 92 - 107
  • [5] On the number of opinions needed for fault-tolerant run-time monitoring in distributed systems
    Fraigniaud, Pierre
    Rajsbaum, Sergio
    Travers, Corentin
    [J]. 1600, Springer Verlag (8734): : 92 - 107
  • [6] A fault-tolerant real-time multiprocessor with a built-in recovery mechanism
    N. A. Kosovets
    L. N. Kosovets
    [J]. Cybernetics and Systems Analysis, 2004, 40 (5) : 772 - 777
  • [7] Holistic schedulability analysis of a fault-tolerant real-time distributed run-time support
    Chevochot, P
    Puaut, I
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2000, : 355 - 362
  • [8] Using Run-Time Reconfiguration to Implement Fault-Tolerant Coarse Grained Reconfigurable Architectures
    Schweizer, Thomas
    Kuester, Anja
    Eisenhardt, Sven
    Kuhn, Tommy
    Rosenstiel, Wolfgang
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 320 - 327
  • [9] A lower bound on the number of opinions needed for fault-tolerant decentralized run-time monitoring
    Fraigniaud P.
    Rajsbaum S.
    Travers C.
    [J]. Journal of Applied and Computational Topology, 2020, 4 (1) : 141 - 179
  • [10] PLURIBUS - OPERATIONAL FAULT-TOLERANT MULTIPROCESSOR
    KATSUKI, D
    ELSAM, ES
    MANN, WF
    ROBERTS, ES
    ROBINSON, JG
    SKOWRONSKI, FS
    WOLF, EW
    [J]. PROCEEDINGS OF THE IEEE, 1978, 66 (10) : 1146 - 1159