A User-level Library for Fault Tolerance on Shared Memory Multicore Systems

被引:0
|
作者
Mushtaq, Hamid [1 ]
Al-Ars, Zaid [1 ]
Bertels, Koen [1 ]
机构
[1] Delft Univ Technol, Comp Engn Lab, Delft, Netherlands
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The ever decreasing transistor size has made it possible to integrate multiple cores on a single die. On the downside, this has introduced reliability concerns as smaller transistors are more prone to both transient and permanent faults. However, the abundant extra processing resources of a multicore system can be exploited to provide fault tolerance by using redundant execution. We have designed a library for multicore processing, that can make a multithreaded user-level application fault tolerant by simple modifications to the code. It uses the abundant cores found in the system to perform redundant execution for error detection. Besides that, it also allows recovery through checkpoint/rollback. Our library is portable since it does not depend on any special hardware. Furthermore, the overhead (up to 46% for 4 threads), our library adds to the original application, is less than other existing approaches, such as Respec.
引用
收藏
页码:266 / 269
页数:4
相关论文
共 50 条
  • [1] The evaluation of user-level software based distributed shared memory systems
    Midorikawa, H
    [J]. 1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 920 - 923
  • [2] USER-LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS
    BERSHAD, BN
    ANDERSON, TE
    LAZOWSKA, ED
    LEVY, HM
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1991, 9 (02): : 175 - 198
  • [3] User-Level Memory Scheduler for Optimizing Application Performance in NUMA-Based Multicore Systems
    Lim, Geunsik
    Suh, Sang-Bum
    [J]. 2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 240 - 243
  • [4] Evaluating and extending user-level fault tolerance in MPI applications
    Laguna, Ignacio
    Richards, David F.
    Gamblin, Todd
    Schulz, Martin
    de Supinski, Bronis R.
    Mohror, Kathryn
    Pritchard, Howard
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (03): : 305 - 319
  • [5] Design and Implementation of User-level Remote Memory Extension Library
    Ahn, Shinyoung
    Cha, Gyuil
    Kim, Youngho
    Lim, Eunji
    [J]. 2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 739 - 744
  • [6] Scheduling user-level threads on distributed shared-memory multiprocessors
    Polychronopoulos, ED
    Papatheodorou, TS
    [J]. EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 358 - 368
  • [7] User-level dynamic page migration for multiprogrammed shared-memory multipropcessors
    Nikolopoulos, DS
    Papatheodorou, TS
    Polychronopoulos, CD
    Labarta, J
    Ayguadé, E
    [J]. 2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 95 - 103
  • [8] μprofiler:: Profiling user-level threads in a shared-memory programming environment
    Buhr, PA
    Denda, R
    [J]. COMPUTING IN OBJECT-ORIENTED PARALLEL ENVIRONMENTS, 1998, 1505 : 159 - 166
  • [9] User-level management of kernel memory
    Haeberlen, A
    Elphinstone, K
    [J]. ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, 2003, 2823 : 277 - 289
  • [10] Fast communication mechanisms - Coupling hardware distributed shared memory and user-level messaging
    Hellwagner, H
    Karl, W
    Leberecht, M
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-III, PROCEEDINGS, 1997, : 1294 - 1301