Impacts of Three Soft-Fault Models on Hybrid Parallel Asynchronous Iterative Methods

被引:0
|
作者
Coleman, Evan [1 ,2 ]
Jensen, Erik J. [2 ]
Sosonkina, Masha [2 ]
机构
[1] Naval Surface Warfare Ctr, Dahlgren Div, Dahlgren, VA 22448 USA
[2] Old Dominion Univ, Modeling Simulat & Visualizat Engn Dept, Norfolk, VA 23529 USA
关键词
Fault modeling; fault tolerance; hybrid parallelism; asynchronous iterative methods;
D O I
10.1109/SBAC-PAD.2018.00076
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This study seeks to understand the soft error vulnerability of asynchronous iterative methods, with a focus on stationary iterative solvers such as Jacobi. The implementations make use of hybrid parallelism where the computational work is distributed over multiple nodes using MPI and parallelized on each node using OpenMP. A series of experiments is conducted to measure the impact of an undetected soft fault on an asynchronous iterative method, and to compare and contrast several techniques for simulating the occurrence of a fault and then recovering from the effects of the faults. The data shows that the two numerical soft-fault models tested here more consistently than a "bit-flip" model produce bad enough behavior to test a variety of recovery strategies, such as those based on partial checkpointing.
引用
收藏
页码:458 / 465
页数:8
相关论文
共 40 条
  • [1] A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES
    Coleman, Evan
    Jamal, Aygul
    Baboulin, Marc
    Khabou, Amal
    Sosonkina, Masha
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT I, 2018, 10777 : 36 - 46
  • [2] H-splittings and asynchronous parallel iterative methods
    Wang, CL
    You, ZY
    JOURNAL OF COMPUTATIONAL MATHEMATICS, 1997, 15 (02) : 97 - 104
  • [3] Parallel hybrid algebraic multilevel iterative methods
    Bai, ZZ
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1997, 267 : 281 - 315
  • [4] New hybrid fault models for asynchronous approximate agreement
    Azadmanesh, MH
    Kieckhafer, RM
    IEEE TRANSACTIONS ON COMPUTERS, 1996, 45 (04) : 439 - 449
  • [5] PARALLEL SYNCHRONOUS AND ASYNCHRONOUS ITERATIVE METHODS TO SOLVE MARKOV-CHAIN PROBLEMS
    TOUZENE, A
    PLATEAU, B
    SUPERCOMPUTER, 1993, 10 (03): : 28 - 39
  • [6] Parallel implementation of hybrid iterative methods for nonsymmetric linear systems
    Jae Heon Yun
    Sang Wook Kim
    Korean Journal of Computational & Applied Mathematics, 1997, 4 (1): : 1 - 16
  • [7] Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery
    Haddock, Jamie
    Needell, Deanna
    Zaeemzadeh, Alireza
    Rahnavard, Nazanin
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 276 - 279
  • [8] Analysis of Parallel Iterative Solving Methods for Markovian Models of Call-Centers
    Bylina, Jaroslaw
    Bylina, Beata
    COMPUTER NETWORKS, 2010, 79 : 322 - +
  • [9] Parallel discrete event simulations of grid-based models: Asynchronous electromagnetic hybrid code
    Karimabadi, Homa
    Driscoll, Jonathan
    Dave, Jagrut
    Omelchenko, Yuri
    Perumalla, Kalyan
    Fujimoto, Richard
    Omidi, Nick
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 573 - 582
  • [10] Research on Three Common Fault Diagnosis Methods for AC Asynchronous Motors Based on Deep Learning
    School of Automation Engineering, Tangshan Polytechnic College, Hebei Province, Tangshan City
    063299, China
    J. Comput., 2023, 6 (153-162): : 153 - 162