Adaptive Fault Tolerance for Many-Core Based Space-Borne Computing

被引:0
|
作者
James, Mark [1 ]
Springer, Paul [1 ]
Zima, Hans [1 ]
机构
[1] CALTECH, Jet Prop Lab, Pasadena, CA 91125 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an approach to providing software fault tolerance for future deep-space robotic NASA missions, which will require a high degree of autonomy supported by an enhanced on-board computational capability. Such systems have become possible as a result of the emerging many-core technology, which is expected to offer 1024-core chips by 2015. We discuss the challenges and opportunities of this new technology, focusing on introspection-based adaptive fault tolerance that takes into account the specific requirements of applications, guided by a fault model. Introspection supports runtime monitoring of the program execution with the goal of identifying, locating, and analyzing errors. Fault tolerance assertions for the introspection system can be provided by the user, domain-specific knowledge, or via the results of static or dynamic program analysis. This work is part of an on-going project at the Jet Propulsion Laboratory in Pasadena, California.
引用
收藏
页码:260 / 274
页数:15
相关论文
共 50 条
  • [1] Towards Byzantine fault tolerance in many-core computing platforms
    Jeffery, Casey M.
    Figueiredo, Renato J. O.
    [J]. 13TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2007, : 256 - 259
  • [2] Many-Core Computing for Space-based Stereoscopic Imaging
    McCall, Paul
    Torres, Gildo
    LeGrand, Keith
    Adjouadi, Malek
    Liu, Chen
    Darling, Jacob
    Pernicka, Henry
    [J]. 2013 IEEE AEROSPACE CONFERENCE, 2013,
  • [3] Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems
    Cristiana Bolchini
    Matteo Carminati
    Antonio Miele
    [J]. Journal of Electronic Testing, 2013, 29 : 159 - 175
  • [4] Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems
    Bolchini, Cristiana
    Carminati, Matteo
    Miele, Antonio
    [J]. JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS, 2013, 29 (02): : 159 - 175
  • [5] Software-Based Hardware Fault Tolerance for Many-Core Architectures
    Wunderlich, Hans-Joachim
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE VLSI SYSTEMS, PROCEEDINGS, 2009, : 223 - 223
  • [6] Algorithm-based Fault-Tolerance on Many-Core Architectures
    Braun, Claus
    Wunderlich, Hans-Joachim
    [J]. IT-INFORMATION TECHNOLOGY, 2010, 52 (04): : 209 - 215
  • [7] Fault Tolerance Mechanism in Chip Many-Core Processors
    张磊
    韩银和
    李华伟
    李晓维
    [J]. Tsinghua Science and Technology, 2007, (S1) : 169 - 174
  • [8] Adaptive Fault Simulation on Many-core Microprocessor Systems
    Haghbayan, Mohammad-Hashem
    Teravainen, Sami
    Rahmani, Amir-Mohammad
    Liljeberg, Pasi
    Tenhunen, Hannu
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2015, : 151 - 154
  • [9] Fault-tolerance at the Management Level in Many-core Systems
    Fochi, Vinicius
    Caimi, Luciano L.
    da Silva, Marcelo H.
    Moraes, Fernando Gehm
    [J]. 2018 31ST SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI), 2018,
  • [10] RMC: an Integrated Runtime System for Adaptive Many-Core Computing
    Park, Jinsu
    Cho, Eunbi
    Baek, Woongki
    [J]. 2016 PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT), 2016,