Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

被引:6
|
作者
bin Bandan, Mohamad Imran [1 ]
Bhattacharjee, Subhasis [1 ]
Shafik, Rishad A. [1 ]
Pradhan, Dhiraj K. [1 ]
Mathew, Jimson [1 ]
机构
[1] Univ Bristol, Bristol BS8 1TH, Avon, England
关键词
Checkpointing; fault tolerance; microprocessors; lifetime reliability;
D O I
10.1109/ISED.2013.32
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Checkpointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel checkpointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a 'near-to-die' processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing checkpointing mechanism, namely the roll-forward checkpointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.
引用
下载
收藏
页码:128 / 132
页数:5
相关论文
共 50 条
  • [1] Energy Efficient Lifetime Reliability-Aware Checkpointing for Real-Time System
    Bin Bandant, Mohamad Imran
    Bhattacharjeel, Subhasis
    Pradhanl, Dhiraj K.
    Matthews, Jimson
    JOURNAL OF LOW POWER ELECTRONICS, 2014, 10 (03) : 401 - 416
  • [2] The case for lifetime reliability-aware microprocessors
    Srinivasan, J
    Adve, SV
    Bose, P
    Rivers, JA
    31ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, : 276 - 287
  • [3] Lifetime Reliability-Aware Digital Synthesis
    Duan, Shengyu
    Zwolinski, Mark
    Halak, Basel
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (11) : 2205 - 2216
  • [4] Interconnect lifetime prediction for reliability-aware systems
    Lu, Zhijian
    Huang, Wei
    Stan, Mircea R.
    Skadron, Kevin
    Lach, John
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2007, 15 (02) : 159 - 172
  • [5] A Case for Lifetime Reliability-Aware Neuromorphic Computing
    Song, Shihao
    Das, Anup
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 596 - 598
  • [6] Reliability-Aware Speedup Models for Parallel Applications with Coordinated Checkpointing/Restart
    Zheng, Ziming
    Yu, Li
    Lan, Zhiling
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1402 - 1415
  • [7] Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms
    Huang, Lin
    Yuan, Feng
    Xu, Qiang
    DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 51 - 56
  • [8] Reliability-Aware Runahead
    Naithani, Ajeya
    Eeckhout, Lieven
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 786 - 799
  • [9] Interconnect lifetime prediction under dynamic stress for reliability-aware design
    Lu, ZJ
    Huang, W
    Lach, J
    Stan, M
    Skadron, K
    ICCAD-2004: INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, IEEE/ACM DIGEST OF TECHNICAL PAPERS, 2004, : 327 - 334
  • [10] On the reliability-aware geographic routing
    Taha, ZQ
    Liu, M
    2005 Wireless Telecommunications Symposium, 2005, : 74 - 78