Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery

被引：8

作者：

Zhu, Xinping ^{[1
]}

Qin, Wei ^{[2
]}

机构：

[1] Northeastern Univ, Boston, MA 02115 USA

[2] Boston Univ, Boston, MA 02215 USA

来源：

43RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2006 | 2006年

关键词：

performance; design; experimentation; verification; retargetable simulation; network-on-chip; multiprocessor system; fault-tolerance; system-on-chip; run-time verification;

D O I：

10.1109/DAC.2006.229177

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modem integrated circuits (ICs) are becoming increasingly complex. The complexity makes it difficult to design, manufacture and integrate these high-performance ICs. The advent of multiprocessor Systems-on-chips (SoCs) makes it even more challenging for programmers to utilize the full potential of the computation resources on the chips. In the mean time, the complexity of the chip design creates new reliability challenges. As a result, chip designers and users cannot fully exploit the tremendous silicon resources on the chip. This research proposes a prototype which is composed of a fault-tolerant multiprocessor SoC and a coupled single program, multiple data (SPMD) programming framework. We use a SystemC based modeling and simulation environment to design and analyze this prototype. Our analysis shows that this prototype as a reliable computing platform constructed from the potentially unreliable chip resources, thus protecting the previous investment of hardware and software designs. Moreover, the promising application-driven simulation results shed light on the potential of a scalable and reliable multiprocessing computing platform for a wide range of mission-critical applications.

引用

页码：53 / +

页数：2

共 50 条

[1] DISTRIBUTED RECOVERY IN FAULT-TOLERANT MULTIPROCESSOR NETWORKS
YANNEY, RM
HAYES, JP
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1986, 35 (10) : 871 - 879
[2] RUN-TIME RESOURCE MANAGEMENT IN FAULT-TOLERANT NETWORK ON RECONFIGURABLE CHIPS
Hosseinabady, Mohammad
Nunez-Yanez, Jose L.
[J]. FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 574 - 577
[3] DISTRIBUTED RECOVERY IN FAULT-TOLERANT MULTIPROCESSOR NETWORKS.
Yanney, Raif M.
Hayes, John P.
[J]. IEEE Transactions on Computers, 1986, C-35 (10) : 871 - 879
[4] On the Number of Opinions Needed for Fault-Tolerant Run-Time Monitoring in Distributed Systems
Fraigniaud, Pierre
Rajsbaum, Sergio
Travers, Corentin
[J]. RUNTIME VERIFICATION, RV 2014, 2014, 8734 : 92 - 107
[5] On the number of opinions needed for fault-tolerant run-time monitoring in distributed systems
Fraigniaud, Pierre
Rajsbaum, Sergio
Travers, Corentin
[J]. 1600, Springer Verlag (8734): : 92 - 107
[6] A fault-tolerant real-time multiprocessor with a built-in recovery mechanism
N. A. Kosovets
L. N. Kosovets
[J]. Cybernetics and Systems Analysis, 2004, 40 (5) : 772 - 777
[7] Holistic schedulability analysis of a fault-tolerant real-time distributed run-time support
Chevochot, P
Puaut, I
[J]. SEVENTH INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2000, : 355 - 362
[8] Using Run-Time Reconfiguration to Implement Fault-Tolerant Coarse Grained Reconfigurable Architectures
Schweizer, Thomas
Kuester, Anja
Eisenhardt, Sven
Kuhn, Tommy
Rosenstiel, Wolfgang
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 320 - 327
[9] A lower bound on the number of opinions needed for fault-tolerant decentralized run-time monitoring
Fraigniaud P.
Rajsbaum S.
Travers C.
[J]. Journal of Applied and Computational Topology, 2020, 4 (1) : 141 - 179
[10] PLURIBUS - OPERATIONAL FAULT-TOLERANT MULTIPROCESSOR
KATSUKI, D
ELSAM, ES
MANN, WF
ROBERTS, ES
ROBINSON, JG
SKOWRONSKI, FS
WOLF, EW
[J]. PROCEEDINGS OF THE IEEE, 1978, 66 (10) : 1146 - 1159

← 1 2 3 4 5 →