A user-transparent recoverable file system for distributed computing environment

被引:4
|
作者
Kim, HS [1 ]
Yeom, HY [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
D O I
10.1109/CLADE.2005.1520898
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In a distributed computing environment, particularly Grid, fault-tolerance is one of the core functionalities the system should provide. MPICH-GF is such a resilient System designed to resist external or internal failures, especially for message passing applications in the Grid environment. But it does not stand the loss of a valuable resource: files. In a normal case, users open files and write data into them in an asynchronous manner, and checkpointing is initiated with no regard to the state of the context of the process. Therefore, the checkpointing system should automatically recognize the running process and protect the open files transparently. We have implemented a recoverable file system, named ReFS, which is incorporated into our fault-tolerant system MPICH-GF. ReFS is a versioning-like file system. ReFS provides middleware libraries with the system call interface to protect specific files at a given time. This will prevent applications from processing their jobs with corrupted data and resulting in incorrect results in case of failures. We have focused not only on the reliability of the system but also on the reduction of inevitable overheads. This paper describes the design and implementation of ReFS and justifies the validity of the behavior of ReFS. We have developed ReFS on Linux, based on Ext2.
引用
收藏
页码:45 / 53
页数:9
相关论文
共 50 条
  • [1] MulConn: User-Transparent I/O Subsystem for High-Performance Parallel File Systems
    Kim, Hwajung
    Bang, Jiwoo
    Sung, Dong Kyu
    Eom, Hyeonsang
    Yeom, Heon Y.
    Sung, Hanul
    [J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 53 - 62
  • [2] The Enterprise Distributed File System Metadata Distribution in Cloud Computing Environment
    Suo, Hui
    Zhou, Gui-Xian
    Liu, Zhuo-hua
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SECURITY (CSIS 2016), 2016, : 435 - 440
  • [3] Athanasia: A User-Transparent and Fault-Tolerant System for Parallel Applications
    Jung, Hyungsoo
    Han, Hyuck
    Yeom, Heon Y.
    Kang, Sooyong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (10) : 1653 - 1668
  • [4] The lookahead in a user-transparent conservative parallel simulator
    Solcany, V
    Safarik, J
    [J]. 16TH WORKSHOP ON PARALLEL AND DISTRIBUTED SIMULATION, PROCEEDINGS, 2002, : 11 - 16
  • [5] User-transparent scheduling for software components on the grid
    Dumitrescu, Catalin L.
    Duennweber, Jan
    Gorlatch, Sergei
    Epema, Dick H. J.
    [J]. ACHIEVEMENTS IN EUROPEAN RESEARCH ON GRID SYSTEMS, 2008, : 41 - 53
  • [6] Developer and User-Transparent Compiler Optimization for Interactive Applications
    Mpeis, Paschalis
    Petoumenos, Pavlos
    Hazelwood, Kim
    Leather, Hugh
    [J]. PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21), 2021, : 268 - 281
  • [7] User-Transparent Translation of Machine Instructions to Programmable Hardware
    Barron, Leslie
    Abdelrahman, Tarek S.
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 7 - 14
  • [8] Curbing Mobile Malware based on User-Transparent Hand Movements
    Shrestha, Babins
    Mohamed, Manar
    Borg, Anders
    Saxena, Nitesh
    Tamrakar, Sandeep
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), 2015, : 221 - 229
  • [9] Network distributed file system in user space
    Voras, Ivan
    Zagar, Mario
    [J]. ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2006, : 669 - +
  • [10] THE EDGE NODE FILE SYSTEM: A DISTRIBUTED FILE SYSTEM FOR HIGH PERFORMANCE COMPUTING
    Ponnavaikko, Kovendhan
    Janakiram, D.
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2009, 10 (01): : 115 - 130