A fault tolerant MPI-10 implementation using the expand parallel file system

被引:2
|
作者
Calderón, A [1 ]
García-Carballeira, F [1 ]
Carretero, J [1 ]
Pérez, JM [1 ]
Sánchez, LM [1 ]
机构
[1] Univ Carlos III Madrid, Dept Comp Sci, Comp Architecture Grp, Madrid, Spain
关键词
parallel file system; NFS; data declustering; clusters; fault-tolerance;
D O I
10.1109/EMPDP.2005.3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in 1/O systems using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system. (1) This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. Expand allows to define different fault-tolerant mechanisms at file level. The evaluation compare the performance of Expand with different configurations with PVFS using the FLASH-1/O benchmark.
引用
收藏
页码:274 / 281
页数:8
相关论文
共 50 条
  • [1] Fault Tolerant in the Expand Ad-Hoc Parallel File System
    Munoz-Munoz, Dario
    Garcia-Carballeira, Felix
    Camarmas-Alonso, Diego
    Calderon-Mateos, Alejandro
    Carretero, Jesus
    EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 62 - 76
  • [2] An implementation of MPI-IO on expand:: A parallel file system based on NFS servers
    Calderón, A
    García, F
    Carretero, J
    Pérez, JM
    Fernández, J
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2002, 2474 : 306 - 313
  • [3] Fault tolerant file models for MPI-IO parallel file systems
    Calderon, A.
    Garcia-Carballeira, F.
    Isaila, Florin
    Keller, Rainer
    Schulz, Alexander
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2007, 4757 : 153 - +
  • [4] Building and using a fault-tolerant MPI implementation
    Fagg, GE
    Dongarra, JJ
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2004, 18 (03): : 353 - 361
  • [5] Design and implementation of a MPI-based parallel file system
    Tsai, Yung-Yu
    Hsieh, Te-Ching
    Lee, Guo-Hua
    Chang, Ming-Feng
    Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, 1999, 23 (01): : 50 - 59
  • [6] A parallel and fault tolerant file system based on NFS servers
    García, F
    Calderón, A
    Carretero, J
    Pérez, JM
    Fernández, J
    ELEVENTH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2003, : 83 - 90
  • [7] Scheduling in grid: Rescheduling MPI applications using a fault-tolerant MPI implementation
    Reddy, M. Vivekananda
    Chaudhary, Sanjay
    2007 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS SOFTWARE & MIDDLEWARE, VOLS 1 AND 2, 2007, : 706 - +
  • [8] The design of the expand parallel file system
    Garcia-Carballeira, F
    Calderon, A
    Carretero, J
    Fernandez, J
    Perez, JM
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2003, 17 (01): : 21 - 37
  • [9] Evaluating expand:: A parallel file system using NFS servers
    García, F
    Calderón, A
    Perez, MS
    Sanchez, LM
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 86 - 91
  • [10] Parallel Query Processing in a Cluster using MPI and File System Caching
    Iyengar, N. Ch. S. N.
    Huda, Monis
    Juneja, Pranav
    Jain, Saurabh
    Vijayasherly, V.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (05): : 249 - 254