A User-level Library for Fault Tolerance on Shared Memory Multicore Systems

被引:0
|
作者
Mushtaq, Hamid [1 ]
Al-Ars, Zaid [1 ]
Bertels, Koen [1 ]
机构
[1] Delft Univ Technol, Comp Engn Lab, Delft, Netherlands
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The ever decreasing transistor size has made it possible to integrate multiple cores on a single die. On the downside, this has introduced reliability concerns as smaller transistors are more prone to both transient and permanent faults. However, the abundant extra processing resources of a multicore system can be exploited to provide fault tolerance by using redundant execution. We have designed a library for multicore processing, that can make a multithreaded user-level application fault tolerant by simple modifications to the code. It uses the abundant cores found in the system to perform redundant execution for error detection. Besides that, it also allows recovery through checkpoint/rollback. Our library is portable since it does not depend on any special hardware. Furthermore, the overhead (up to 46% for 4 threads), our library adds to the original application, is less than other existing approaches, such as Respec.
引用
收藏
页码:266 / 269
页数:4
相关论文
共 50 条
  • [31] Memory management system structure supporting user-level program participation
    Zhang, Li
    Yao, Yaguang
    Wang, Shiyou
    [J]. Jisuanji Gongcheng/Computer Engineering, 2000, 26 (04): : 44 - 46
  • [32] Shared memory resources allocation and management research on multicore systems
    Gao, Ke
    Chen, Li-Cheng
    Fan, Dong-Rui
    Liu, Zhi-Yong
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (05): : 1020 - 1034
  • [33] Fault-tolerance using Cache-coherent distributed shared memory systems
    Hecht, DL
    Kavi, KM
    Gaede, RK
    Katsinis, C
    [J]. FOURTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS (I-SPAN'99), PROCEEDINGS, 1999, : 100 - 105
  • [34] Fault-tolerance using cache-coherent distributed shared memory systems
    Univ of Alabama in Huntsville, Huntsville, United States
    [J]. Int Symp Parall Archit Algorithms Networks I SPAN, (100-105):
  • [35] UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage
    Brim, Michael J.
    Moodyt, Adam T.
    Lim, Seung-Hwan
    Miller, Ross
    Boehm, Swen
    Stanaviget, Cameron
    Mohrort, Kathryn M.
    Oral, Sarp
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 290 - 300
  • [36] Millipede: A user-level NT-based distributed shared memory system with thread migration and dynamic run-time optimization of memory references
    Itzkovitz, A
    Schuster, A
    Shalev, L
    [J]. PROCEEDINGS OF THE USENIX WINDOWS NT WORKSHOP, 1997, : 148 - 148
  • [37] Design and implementation of user level shared memory protocol
    Wu, Junmin
    Gao, Yuan
    Jiang, Song
    Zheng, Shirong
    [J]. 2000, Shenyang Inst Comput Technol, China (21):
  • [38] Design and implementation of user level shared memory protocol
    Wu, Junmin
    Gao, Yuan
    Jiang, Song
    Zheng, Shirong
    [J]. Xiaoxing Weixing Jisuanji Xitong/Mini-Micro Systems, 2000, 21 (03): : 337 - 340
  • [39] Fault recovery for distributed shared memory systems
    Dieter, WR
    Lumpp, JE
    [J]. 1997 IEEE AEROSPACE CONFERENCE PROCEEDINGS, VOL 2, 1997, : 525 - 540
  • [40] User-Level Secure Deletion on Log-structured File Systems
    Reardon, Joel
    Marforio, Claudio
    Capkun, Srdjan
    Basin, David
    [J]. 7TH ACM SYMPOSIUM ON INFORMATION, COMPUTER AND COMMUNICATIONS SECURITY (ASIACCS 2012), 2012,