Private reliability environments for efficient fault-tolerance in CGRAs

被引:2
|
作者
Jafri, Syed M. A. H. [1 ]
Piestrak, Stanislaw J. [2 ]
Hemani, Ahmed [3 ]
Paul, Kolin [4 ]
Plosila, Juha [1 ]
Tenhunen, Hannu [3 ]
机构
[1] Univ Turku, Turku, Finland
[2] Univ Lorraine, Inst Jean Lamour, CNRS, UMR 7198, Vandoeuvre Les Nancy, France
[3] Royal Inst Technol KTH, Stockholm, Sweden
[4] Indian Inst Technol, Delhi, India
关键词
Fault-tolerance; Reliability; Adaptive systems; Energy-aware systems; Scrubbing; Reconfiguration; Coarse grained reconfigurable arrays; RECONFIGURABLE ARCHITECTURE; DESIGN;
D O I
10.1007/s10617-014-9129-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.
引用
收藏
页码:295 / 327
页数:33
相关论文
共 50 条
  • [1] Private reliability environments for efficient fault-tolerance in CGRAs
    Syed M. A. H. Jafri
    Stanislaw J. Piestrak
    Ahmed Hemani
    Kolin Paul
    Juha Plosila
    Hannu Tenhunen
    [J]. Design Automation for Embedded Systems, 2014, 18 : 295 - 327
  • [2] Efficient Byzantine Fault-Tolerance
    Veronese, Giuliana Santos
    Correia, Miguel
    Bessani, Alysson Neves
    Lung, Lau Cheuk
    Verissimo, Paulo
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (01) : 16 - 30
  • [3] Private configuration environments (PCE) for efficient reconfiguration, in CGRAs
    Tajammul, Muhammad Adeel
    Jafri, Syed. M. A. H.
    Hemani, Ahmed
    Plosila, Juha
    Tenhunen, Hannu
    [J]. PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13), 2013, : 227 - 236
  • [4] Reliability and Fault-Tolerance by Choreographic Design
    Cassar, Ian
    Francalanza, Adrian
    Mezzina, Claudio Antares
    Tuosto, Emilio
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2017, (254): : 69 - 80
  • [5] Fault-tolerance and reliability in networked sensor systems
    Li, HL
    Xing, LD
    [J]. Proceedings of the 4th International Conference on Quality & Reliability, 2005, : 399 - 408
  • [6] RELIABILITY AND FAULT-TOLERANCE IN MULTISTAGE INTERCONNECTION NETWORKS
    RAGHAVENDRA, CS
    VARMA, A
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1987, 11 : 111 - 128
  • [7] FAULT-TOLERANCE
    GROSSPIETSCH, KE
    [J]. MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 783 - 783
  • [8] A new approach for mobile agent fault-tolerance and reliability
    Mohammadi, K.
    Hamidi, H.
    [J]. 2005 1ST IEEE/IFIP INTERNATIONAL CONFERENCE IN CENTRAL ASIA ON INTERNET (ICI), 2005, : 164 - 168
  • [9] Runtime Reliability Monitoring for Complex Fault-Tolerance Policies
    Fantechi, Alessandro
    Gori, Gloria
    Papini, Marco
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY, ICSRS, 2022, : 110 - 119
  • [10] Review of Multistage Interconnection Networks Reliability and Fault-Tolerance
    Rajkumar, S.
    Goyal, Neeraj Kumar
    [J]. IETE TECHNICAL REVIEW, 2016, 33 (03) : 223 - 230