Impacts of Three Soft-Fault Models on Hybrid Parallel Asynchronous Iterative Methods

被引:0
|
作者
Coleman, Evan [1 ,2 ]
Jensen, Erik J. [2 ]
Sosonkina, Masha [2 ]
机构
[1] Naval Surface Warfare Ctr, Dahlgren Div, Dahlgren, VA 22448 USA
[2] Old Dominion Univ, Modeling Simulat & Visualizat Engn Dept, Norfolk, VA 23529 USA
来源
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018) | 2018年
关键词
Fault modeling; fault tolerance; hybrid parallelism; asynchronous iterative methods;
D O I
10.1109/SBAC-PAD.2018.00076
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This study seeks to understand the soft error vulnerability of asynchronous iterative methods, with a focus on stationary iterative solvers such as Jacobi. The implementations make use of hybrid parallelism where the computational work is distributed over multiple nodes using MPI and parallelized on each node using OpenMP. A series of experiments is conducted to measure the impact of an undetected soft fault on an asynchronous iterative method, and to compare and contrast several techniques for simulating the occurrence of a fault and then recovering from the effects of the faults. The data shows that the two numerical soft-fault models tested here more consistently than a "bit-flip" model produce bad enough behavior to test a variety of recovery strategies, such as those based on partial checkpointing.
引用
收藏
页码:458 / 465
页数:8
相关论文
共 40 条
  • [31] Flat MPI vs. Hybrid: Evaluation of Parallel Programming Models for Preconditioned Iterative Solvers on "T2K Open Supercomputer"
    Nakajima, Kengo
    2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 73 - 80
  • [32] Three methods for avoiding the impacts of incompatible site index and height prediction models demonstrated on jack pine curves for Ontario
    Rose, CE
    Cieszewski, CJ
    Carmean, WH
    FORESTRY CHRONICLE, 2003, 79 (05): : 928 - 935
  • [33] Potential impacts of climate change on groundwater level through hybrid soft-computing methods: a case study—Shabestar Plain, Iran
    Esmaeil Jeihouni
    Mirali Mohammadi
    Saeid Eslamian
    Mohammad Javad Zareian
    Environmental Monitoring and Assessment, 2019, 191
  • [34] A Two-Terminal Hybrid Parallel Connection Method for Simultaneously Enhancing the Output Performance and Fault Tolerance of Dual Three-Phase Machines
    Yu, Jianzong
    Yang, Jiangtao
    Li, Qing
    Pan, Yuanhang
    Gao, Chuang
    Huang, Shoudao
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024,
  • [35] Potential impacts of climate change on groundwater level through hybrid soft-computing methods: a case study-Shabestar Plain, Iran
    Jeihouni, Esmaeil
    Mohammadi, Mirali
    Eslamian, Saeid
    Zareian, Mohammad Javad
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2019, 191 (10)
  • [36] Adaptive feature extraction and fault diagnosis for three-phase inverter based on hybrid-CNN models under variable operating conditions
    Quan Sun
    Xianghai Yu
    Hongsheng Li
    Jisheng Fan
    Complex & Intelligent Systems, 2022, 8 : 29 - 42
  • [37] Adaptive feature extraction and fault diagnosis for three-phase inverter based on hybrid-CNN models under variable operating conditions
    Sun, Quan
    Yu, Xianghai
    Li, Hongsheng
    Fan, Jisheng
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (01) : 29 - 42
  • [38] Discussion on fault detection method for transmission line of LCC-MMC parallel three-terminal hybrid UHVDC system based on multi-layer perception
    Xing C.
    Gao J.
    Bi G.
    Chen S.
    Cai W.
    Wang L.
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2023, 43 (03): : 138 - 145
  • [39] Fault zone diagnosis of three-terminal hybrid UHVDC transmission lines based on multi-mode decomposition and multi-branch parallel residual network
    Chen, Shilong
    Li, Guohui
    Bi, Guihong
    Bao, Tongyu
    Zhang, Zirui
    Luo, Linglin
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2024, 44 (10): : 140 - 147
  • [40] A New Life System Approach to the Prognostic and Health Management (PHM) with Survival Analysis, Dynamic Hybrid Fault Models, Evolutionary Game Theory, and Three-Layer Survivability Analysis
    Ma, Zhanshan
    2009 IEEE AEROSPACE CONFERENCE, VOLS 1-7, 2009, : 3686 - 3705