Three-Layer MPI Fault-Tolerance Techniques

被引:0
|
作者
Guo Yucheng [1 ]
Wu Peng [1 ]
Tang Xiaoyi [1 ]
Guo Qingping [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Hubei, Peoples R China
关键词
Fault-Tolerance; MPI; Cloud Computing; Task Dynamic Migration;
D O I
10.1109/DCABES.2013.34
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
To make the MPI adapting to the data-intensive tasks, the MPI must own perfect fault-tolerance capabilities to handle errors. This paper proposes a three-layer fault-tolerance technique that can achieve this purpose excellently. The MPI task dynamic migration technique proposed and implemented by our Lab has rarely seen in literatures.
引用
收藏
页码:146 / 149
页数:4
相关论文
共 50 条
  • [1] A portable fault-tolerance scheme for MPI
    Louca, S
    Neophytou, N
    Evripidou, P
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 690 - 697
  • [2] Survey of Fault-Tolerance Techniques for Three-Phase Voltage Source Inverters
    Mirafzal, Behrooz
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2014, 61 (10) : 5192 - 5202
  • [3] Synchronisation delay in hardware fault-tolerance techniques
    Balsamo, S
    Mura, I
    [J]. IEEE INTERNATIONAL COMPUTER PERFORMANCE AND DEPENDABILITY SYMPOSIUM - IPDS'96, PROCEEDINGS, 1996, : 240 - 249
  • [4] Hardware and Software Techniques for Heterogeneous Fault-Tolerance
    Rehman, Semeen
    Kriebel, Florian
    Prabakaran, Bharath Srinivas
    Khalid, Faiq
    Shafique, Muhammad
    [J]. 2018 IEEE 24TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2018), 2018, : 115 - 118
  • [5] Fault-tolerance techniques for hybrid CMOS/nanoarchitecture
    Melouki, A.
    Srivastava, S.
    Al-Hashimi, B. M.
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2010, 4 (03): : 240 - 250
  • [6] Enhancing fault-tolerance of large-scale MPI scientific applications
    Rodriguez, G.
    Gonzalez, P.
    Martin, M. J.
    Tourino, J.
    [J]. PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2007, 4671 : 153 - 161
  • [7] FAULT-TOLERANCE
    GROSSPIETSCH, KE
    [J]. MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 783 - 783
  • [8] Designing masking fault-tolerance via nonmasking fault-tolerance
    Arora, A
    Kulkarni, SS
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (06) : 435 - 450
  • [9] EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications
    Chakraborty, Sourav
    Laguna, Ignacio
    Emani, Murali
    Mohror, Kathryn
    Panda, Dhabaleswar K.
    Schulz, Martin
    Subramoni, Hari
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (03):
  • [10] A Service Selection Mechanism using Fault-Tolerance Techniques
    de Souza, Higor Amario
    Guimaraes, Felipe Pontes
    Kon, Fabio
    Batista, Daniel Macedo
    [J]. 2014 BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 2014, : 214 - 222