Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

被引:6
|
作者
bin Bandan, Mohamad Imran [1 ]
Bhattacharjee, Subhasis [1 ]
Shafik, Rishad A. [1 ]
Pradhan, Dhiraj K. [1 ]
Mathew, Jimson [1 ]
机构
[1] Univ Bristol, Bristol BS8 1TH, Avon, England
关键词
Checkpointing; fault tolerance; microprocessors; lifetime reliability;
D O I
10.1109/ISED.2013.32
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Checkpointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel checkpointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a 'near-to-die' processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing checkpointing mechanism, namely the roll-forward checkpointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.
引用
下载
收藏
页码:128 / 132
页数:5
相关论文
共 50 条
  • [11] Reliability-aware system synthesis
    Glass, Michael
    Lukasiewycz, Martin
    Streichert, Thilo
    Haubelt, Christian
    Teich, Juergen
    2007 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2007, : 409 - 414
  • [12] RELIABILITY-AWARE MICROARCHITECTURE DESIGN
    Reddi, Vijay Janapa
    IEEE MICRO, 2013, 33 (04) : 4 - 5
  • [13] Reliability-Aware Design to Suppress Aging
    Amrouch, Hussam
    Khaleghi, Behnam
    Gerstlauer, Andreas
    Henkel, Joerg
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [14] Reliability-aware probabilistic reserve procurement
    Herre, Lars
    Pinson, Pierre
    Chatzivasileiadis, Spyros
    ELECTRIC POWER SYSTEMS RESEARCH, 2022, 212
  • [15] Instruction Scheduling for Reliability-Aware Compilation
    Rehman, Semeen
    Shafique, Muhammad
    Henkel, Joerg
    2012 49TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2012, : 1288 - 1296
  • [16] Reliability-Aware Optimization of a Wideband Antenna
    Kouassi, Attibaud
    Nghia Nguyen-Trong
    Kaufmann, Thomas
    Lallechere, Sebastien
    Bonnet, Pierre
    Fumeaux, Christophe
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2016, 64 (02) : 450 - 460
  • [17] RELIABILITY ANALYSIS OF CHECKPOINTING MODEL WITH MULTIPLE VERIFICATION MECHANISM
    Lee, Yutae
    BULLETIN OF THE KOREAN MATHEMATICAL SOCIETY, 2019, 56 (06) : 1435 - 1445
  • [18] Reliability-aware server consolidation for balancing energy-lifetime tradeoff in virtualized cloud datacenters
    Deng, Wei
    Liu, Fangming
    Jin, Hai
    Liao, Xiaofei
    Liu, Haikun
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2014, 27 (04) : 623 - 642
  • [19] Mitigating Lifetime-Energy-Makespan Issues in Reliability-Aware Workflow Scheduling for Big Data
    Xiong, Yu-Jie
    Cheng, Song-Yang
    Chen, Bin
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (01)
  • [20] Reliability-Aware Requirements Development for Autonomy Software
    Meshkat, Leila
    Magnusson, Gudjon
    Diep, Madeline
    Lindvall, Mikael
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,