Managing Dynamic Reconfiguration for Fault-tolerance on a Manycore Architecture

被引:1
|
作者
Zain-ul-Abdin [1 ]
Gebrewahid, Essayas [1 ]
Svensson, Bertil [1 ]
机构
[1] Halmstad Univ, Ctr Res Embedded Syst, Halmstad, Sweden
关键词
D O I
10.1109/IPDPSW.2012.38
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advent of manycore architectures comprising hundreds of processing elements, fault management has become a major challenge. We present an approach that uses the occam-pi language to manage the fault recovery mechanism on a new manycore architecture, the Platform 2012 (P2012). The approach is made possible by extending our previously developed compiler framework to compile occam-pi implementations to the P2012 architecture. We describe the techniques used to translate the salient features of the occam-pi language to the native programming model of the P2012 architecture. We demonstrate the applicability of the approach by an experimental case study, in which the DCT algorithm is implemented on a set of four processing elements. During runtime, some of the tasks are then relocated from assumed faulty processing elements to the faultless ones by means of dynamic reconfiguration of the hardware. The working of the demonstrator and the simulation results illustrate not only the feasibility of the approach but also how the use of higher-level abstractions simplifies the fault handling.
引用
下载
收藏
页码:312 / 319
页数:8
相关论文
共 50 条
  • [31] Simulation relations for fault-tolerance
    Demasi, Ramiro
    Castro, Pablo F.
    Maibaum, Thomas S. E.
    Aguirre, Nazareno
    FORMAL ASPECTS OF COMPUTING, 2017, 29 (06) : 1013 - 1050
  • [32] Fault-tolerance in a Boltzmann machine
    Price, CC
    Hanks, JB
    Stephens, JN
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 1326 - 1331
  • [33] Efficient Byzantine Fault-Tolerance
    Veronese, Giuliana Santos
    Correia, Miguel
    Bessani, Alysson Neves
    Lung, Lau Cheuk
    Verissimo, Paulo
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (01) : 16 - 30
  • [34] FAULT-TOLERANCE IN PARALLEL ARCHITECTURES
    SAMI, MG
    SCARABOTTOLO, N
    LECTURE NOTES IN COMPUTER SCIENCE, 1987, 272 : 349 - 372
  • [35] Measuring Masking Fault-Tolerance
    Castro, Pablo F.
    D'Argenio, Pedro R.
    Demasi, Ramiro
    Putruele, Luciano
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT II, 2019, 11428 : 375 - 392
  • [36] PARALLELISM AND FAULT-TOLERANCE IN THE CHORUS
    BANINO, JS
    JOURNAL OF SYSTEMS AND SOFTWARE, 1986, 6 (1-2) : 205 - 211
  • [37] HEALTHY FUTURE FOR FAULT-TOLERANCE
    不详
    COMPUTING SYSTEMS, 1991, 6 (02): : 125 - 125
  • [38] Connectivity and fault-tolerance of hyperdigraphs
    Ferrero, D
    Padró, C
    DISCRETE APPLIED MATHEMATICS, 2002, 117 (1-3) : 15 - 26
  • [39] A fault-tolerance mechanism in grid
    Jin, L
    Tong, WQ
    Tang, HQ
    Wang, B
    INDIN 2003: IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, PROCEEDINGS, 2003, : 457 - 461
  • [40] ISSUES IN SECURITY AND FAULT-TOLERANCE
    HARTIG, H
    KUHNHAUSER, W
    LIEDTKE, J
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 212 - 216