Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

被引:6
|
作者
bin Bandan, Mohamad Imran [1 ]
Bhattacharjee, Subhasis [1 ]
Shafik, Rishad A. [1 ]
Pradhan, Dhiraj K. [1 ]
Mathew, Jimson [1 ]
机构
[1] Univ Bristol, Bristol BS8 1TH, Avon, England
关键词
Checkpointing; fault tolerance; microprocessors; lifetime reliability;
D O I
10.1109/ISED.2013.32
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Checkpointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel checkpointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a 'near-to-die' processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing checkpointing mechanism, namely the roll-forward checkpointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.
引用
下载
收藏
页码:128 / 132
页数:5
相关论文
共 50 条
  • [31] Increasing the Accuracy of Reliability-aware Resynthesis with Standard Cell Reliability Characterization
    Stempkovskiy, Alexander
    Telpukhov, Dmitry
    Solovyev, Roman A.
    Nadolenko, Vladislav
    PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS), 2021, : 2035 - 2039
  • [32] Reliability-aware microarchitecture - Guest Editor's introduction
    Adve, SV
    Sanda, P
    IEEE MICRO, 2005, 25 (06) : 8 - 9
  • [33] Reliability-Aware Task Replication for Mobile Edge Computing
    Yang, Lipei
    Zhou, Ao
    Ma, Xiao
    Zhang, Yiran
    Wang, Shangguang
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (14): : 24846 - 24857
  • [34] Reliability-Aware Routing of AVB Streams in TSN Networks
    Atallah, Ayman A.
    Hamad, Ghaith Bany
    Mohamed, Otmane Ait
    RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018, 2018, 10868 : 697 - 708
  • [35] Reliability-aware Co-synthesis for Embedded Systems
    Y. Xie
    L. Li
    M. Kandemir
    N. Vijaykrishnan
    M. J. Irwin
    The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2007, 49 : 87 - 99
  • [36] Reliability-aware co-synthesis for embedded systems
    Xie, Y.
    Li, L.
    Kandemir, M.
    Vijaykrishnan, N.
    Irwin, M. J.
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2007, 49 (01): : 87 - 99
  • [37] Reliability-aware co-synthesis for embedded systems
    Xie, Y
    Li, L
    Kandemir, M
    Vijaykrishnan, N
    Irwin, MJ
    15TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 2004, : 41 - 50
  • [38] Fog Resource Provisioning in Reliability-Aware IoT Networks
    Yao, Jingjing
    Ansari, Nirwan
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (05) : 8262 - 8269
  • [39] A reliability-aware vehicular crowdsensing system for pothole profiling
    Zhong W.
    Suo Q.
    Ma F.
    Hou Y.
    Gupta A.
    Qiao C.
    Su L.
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2019, 3 (04):
  • [40] Reliability-aware Operation Chaining in High Level Synthesis
    Chen, Liang
    Ebrahimi, Mojtaba
    Tahoori, Mehdi B.
    2015 20TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2015,