Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

被引:6
|
作者
bin Bandan, Mohamad Imran [1 ]
Bhattacharjee, Subhasis [1 ]
Shafik, Rishad A. [1 ]
Pradhan, Dhiraj K. [1 ]
Mathew, Jimson [1 ]
机构
[1] Univ Bristol, Bristol BS8 1TH, Avon, England
关键词
Checkpointing; fault tolerance; microprocessors; lifetime reliability;
D O I
10.1109/ISED.2013.32
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Checkpointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel checkpointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a 'near-to-die' processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing checkpointing mechanism, namely the roll-forward checkpointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.
引用
下载
收藏
页码:128 / 132
页数:5
相关论文
共 50 条
  • [41] Reliability-aware link management strategy for network on chip
    Jiao, Jia-Jia
    Fu, Yu-Zhuo
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2013, 47 (01): : 39 - 43
  • [42] Reliability-aware SOC voltage islands partition and floorplan
    Yang, Tshengqi
    Wolf, Wayne
    Vijaykrishnan, N.
    Xie, Yuan
    IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: EMERGING VLSI TECHNOLOGIES AND ARCHITECTURES, 2006, : 343 - +
  • [43] Reliability-aware automatic composition approach for web services
    Li Mu
    Li Bo
    Huai JinPeng
    SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (04) : 921 - 937
  • [44] Reliability-Aware Design Automation Flow for Analog Circuits
    Liu, Chien-Nan Jimmy
    Chen, Yen-Lung
    Liu, Tsung-Yu
    Chen, Tai-Chen
    2015 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2015, : 1 - 2
  • [45] Enhancing Reliability-Aware Speedup Modeling via Replication
    Hussain, Zaeem
    Znati, Taieb
    Melhem, Rami
    2020 50TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2020), 2020, : 528 - 539
  • [46] Reliability-Aware Data Placement for Heterogeneous Memory Architecture
    Gupta, Manish
    Sridharan, Vilas
    Roberts, David
    Prodromou, Andreas
    Venkat, Ashish
    Tullsen, Dean
    Gupta, Rajesh
    2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 583 - 595
  • [47] RAISE: Reliability-Aware Instruction SchEduling for Unreliable Hardware
    Rehman, Semeen
    Shafique, Muhammad
    Kriebel, Florian
    Henkel, Joerg
    2012 17TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2012, : 671 - 676
  • [48] Reliability-Aware Network Slicing in Elastic Demand Scenarios
    Gomes, Rafael L.
    Bittencourt, Luiz F.
    Madeira, Edmundo R. M.
    IEEE COMMUNICATIONS MAGAZINE, 2020, 58 (10) : 29 - 34
  • [49] Reliability-aware design for nanometer-scale devices
    Atienza, David
    De Micheli, Giovanni
    Benini, Luca
    Ayala, Jose L.
    Del Valle, Pablo G.
    DeBole, Michael
    Narayanan, Vijay
    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 503 - +
  • [50] Reliability-aware automatic composition approach for web services
    LI Mu 1
    2 School of Computer Science and Engineering
    Science China(Information Sciences), 2012, 55 (04) : 921 - 937