Managing Dynamic Reconfiguration for Fault-tolerance on a Manycore Architecture

被引:1
|
作者
Zain-ul-Abdin [1 ]
Gebrewahid, Essayas [1 ]
Svensson, Bertil [1 ]
机构
[1] Halmstad Univ, Ctr Res Embedded Syst, Halmstad, Sweden
关键词
D O I
10.1109/IPDPSW.2012.38
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advent of manycore architectures comprising hundreds of processing elements, fault management has become a major challenge. We present an approach that uses the occam-pi language to manage the fault recovery mechanism on a new manycore architecture, the Platform 2012 (P2012). The approach is made possible by extending our previously developed compiler framework to compile occam-pi implementations to the P2012 architecture. We describe the techniques used to translate the salient features of the occam-pi language to the native programming model of the P2012 architecture. We demonstrate the applicability of the approach by an experimental case study, in which the DCT algorithm is implemented on a set of four processing elements. During runtime, some of the tasks are then relocated from assumed faulty processing elements to the faultless ones by means of dynamic reconfiguration of the hardware. The working of the demonstrator and the simulation results illustrate not only the feasibility of the approach but also how the use of higher-level abstractions simplifies the fault handling.
引用
下载
收藏
页码:312 / 319
页数:8
相关论文
共 50 条
  • [21] Fault-Tolerance in Resolvability
    Javaid, Imran
    Salman, Muhammad
    Chaudhry, Muhammad Anwar
    Shokat, Sara
    UTILITAS MATHEMATICA, 2009, 80 : 263 - 275
  • [22] Fault tolerance using dynamic reconfiguration on the POEtic tissue
    Barker, Will
    Halliday, David M.
    Thoma, Yann
    Sanchez, Eduardo
    Tempesti, Gianluca
    Tyrrell, Andy M.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2007, 11 (05) : 666 - 684
  • [23] ON FAULT-TOLERANCE AND FAULT-AVOIDANCE
    REGULINSKI, TLD
    IEEE TRANSACTIONS ON RELIABILITY, 1987, 36 (02) : 161 - 161
  • [24] Dynamic fault tolerance in FPGAs via partial reconfiguration
    Emmert, J
    Stroud, C
    Skaggs, B
    Abramovici, M
    2000 IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2000, : 165 - 174
  • [25] Factorization of Quadratic Polynomial with Fault-Tolerance and Practice Oriented Learning Architecture
    Chang, Hsiu-Ju
    LINKING APPLICATIONS WITH MATHEMATICS AND TECHNOLOGY, 2010, : 305 - 314
  • [26] Improving fault-tolerance in MAS with dynamic proxy replicate groups
    Fedoruk, A
    Deters, R
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 364 - 370
  • [27] A Fault-Tolerance Architecture for Kepler-Based Distributed Scientific Workflows
    Mouallem, Pierre
    Crawl, Daniel
    Altintas, Ilkay
    Vouk, Mladen
    Yildiz, Ustun
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 452 - +
  • [28] A Cloud Storage Architecture for High Data Availability, Reliability, and Fault-tolerance
    Alfawair, Mai
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FUTURE NETWORKS AND DISTRIBUTED SYSTEMS (ICFNDS '17), 2017,
  • [29] FAULT-TOLERANCE OF DYNAMIC-FULL-ACCESS INTERCONNECTION NETWORKS
    SHEN, JP
    HAYES, JP
    IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (03) : 241 - 248
  • [30] HELLENIC FAULT-TOLERANCE FOR ROBOTS
    TOYE, G
    LEIFER, LJ
    COMPUTERS & ELECTRICAL ENGINEERING, 1994, 20 (06) : 479 - 497