Managing Dynamic Reconfiguration for Fault-tolerance on a Manycore Architecture

被引:1
|
作者
Zain-ul-Abdin [1 ]
Gebrewahid, Essayas [1 ]
Svensson, Bertil [1 ]
机构
[1] Halmstad Univ, Ctr Res Embedded Syst, Halmstad, Sweden
关键词
D O I
10.1109/IPDPSW.2012.38
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advent of manycore architectures comprising hundreds of processing elements, fault management has become a major challenge. We present an approach that uses the occam-pi language to manage the fault recovery mechanism on a new manycore architecture, the Platform 2012 (P2012). The approach is made possible by extending our previously developed compiler framework to compile occam-pi implementations to the P2012 architecture. We describe the techniques used to translate the salient features of the occam-pi language to the native programming model of the P2012 architecture. We demonstrate the applicability of the approach by an experimental case study, in which the DCT algorithm is implemented on a set of four processing elements. During runtime, some of the tasks are then relocated from assumed faulty processing elements to the faultless ones by means of dynamic reconfiguration of the hardware. The working of the demonstrator and the simulation results illustrate not only the feasibility of the approach but also how the use of higher-level abstractions simplifies the fault handling.
引用
收藏
页码:312 / 319
页数:8
相关论文
共 50 条
  • [41] Fault-tolerance in biochemical systems
    Winfree, Erik
    UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2006, 4135 : 26 - 26
  • [42] SUBCUBE FAULT-TOLERANCE IN HYPERCUBES
    GRAHAM, N
    HARARY, F
    LIVINGSTON, M
    STOUT, QF
    INFORMATION AND COMPUTATION, 1993, 102 (02) : 280 - 314
  • [43] Randomness versus Fault-Tolerance
    Ran Canetti
    Eyal Kushilevitz
    Rafail Ostrovsky
    Adi Rosén
    Journal of Cryptology, 2000, 13 : 107 - 142
  • [44] Fault-tolerance with multimodule routers
    Chalasani, S
    Boppana, RV
    SECOND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1996, : 201 - 210
  • [45] Randomness versus fault-tolerance
    Canetti, R
    Kushilevitz, E
    Ostrovsky, R
    Rosén, A
    JOURNAL OF CRYPTOLOGY, 2000, 13 (01) : 107 - 142
  • [46] A unified fault-tolerance protocol
    Miner, P
    Geser, A
    Pike, L
    Maddalon, J
    FORMAL TECHNIQUES, MODELLING AND ANALYSIS OF TIMED AND FAULT-TOLERANT SYSTEMS, PROCEEDINGS, 2004, 3253 : 167 - 182
  • [47] Automated Fault-Tolerance Testing
    Nagarajan, Adithya
    Vaddadi, Ajay
    2016 IEEE NINTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW), 2016, : 275 - 276
  • [48] LAN DISTRIBUTED FAULT-TOLERANCE
    MIROJULIA, J
    DECENTRALIZED AND DISTRIBUTED SYSTEMS, 1993, 39 : 161 - 174
  • [49] FAULT-TOLERANCE SUPPORT IN A SERVODRIVE
    KULIK, AS
    AVTOMATIKA, 1986, (05): : 68 - 71
  • [50] Design of Placement Algorithm with Fault-Tolerance Based on Dynamic Evolvement Arithmetic for Coarse-Grain Reconfigurable Architecture
    Liu, Gang
    Qiang, Wei
    Zeng, Sanyou
    Song, Liguo
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 26 - 30