Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

被引:22
|
作者
Oliveira, Daniel [1 ]
Pilla, Laercio [2 ]
Hanzich, Mauricio [3 ]
Fratin, Vinicius [1 ]
Fernandes, Fernando [1 ]
Lunardi, Caio [1 ]
Maria Cela, Jose [3 ]
Navaux, Philippe [1 ]
Carro, Luigi [1 ]
Rech, Paolo [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Univ Fed Santa Catarina, Dept Informat & Stat, Florianopolis, SC, Brazil
[3] Barcelona Supercomp Ctr, CASE Dept, Barcelona, Spain
关键词
SOFT-ERROR; FAULT-TOLERANCE;
D O I
10.1109/HPCA.2017.41
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications' output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.
引用
收藏
页码:577 / 588
页数:12
相关论文
共 50 条
  • [1] Electron accelerators as applied to radiation-induced modification of polymers
    Zimek, Z
    POLIMERY, 1997, 42 (03) : 148 - 155
  • [2] Radiation-induced cancer: a modern view
    Shah, D. J.
    Sachs, R. K.
    Wilson, D. J.
    BRITISH JOURNAL OF RADIOLOGY, 2012, 85 (1020): : E1166 - E1173
  • [3] USE OF DIMALEIMIDES AS ACCELERATORS FOR RADIATION-INDUCED VULCANIZATION OF HYDROCARBON POLYMERS
    MILLER, SM
    VALE, RL
    ROBERTS, R
    JOURNAL OF POLYMER SCIENCE, 1962, 58 (166): : 737 - &
  • [4] Radiation-induced dicentrics: Tracking by parallel sequencing
    Vaughan, Andrew T. M.
    Shih, Shyh-Jen
    Singh, Sheetal
    Do, To Uyen
    CANCER RESEARCH, 2010, 70
  • [5] Radiation-induced dicentrics: Tracking by parallel sequencing
    Vaughan, Andrew Tm
    Shift, Shyh-Jen
    Singh, Sheetal
    Do, To Uyen
    CANCER RESEARCH, 2010, 70
  • [6] Radiation-Induced Liver Disease and Modern Radiotherapy
    Koay, Eugene J.
    Owen, Dawn
    Das, Prajnan
    SEMINARS IN RADIATION ONCOLOGY, 2018, 28 (04) : 321 - 331
  • [7] CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators
    Oliveira, Daniel
    Frattin, Vinicius
    Navaux, Philippe
    Koren, Israel
    Rech, Paolo
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 295 - 298
  • [8] Tolerating Radiation-Induced Transient Faults in Modern Processors
    Xiaobin Li
    Jean-Luc Gaudiot
    International Journal of Parallel Programming, 2010, 38 : 85 - 116
  • [9] A Modern Animal Model of Radiation-induced Erectile Dysfunction
    Kimura, M.
    Koontz, B. F.
    Yan, H.
    Rabbani, Z.
    Satoh, T.
    Baba, S.
    Yin, F.
    Donatucci, C. F.
    Polascik, T. J.
    Vujaskovic, Z.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2010, 78 (03): : S40 - S40
  • [10] Modern cataract surgery for radiation-induced cataracts in retinoblastoma
    Osman, Ihab M.
    Abouzeid, Hana
    Balmer, Aubin
    Gaillard, Marie-Claire
    Othenin-Girard, Philippe
    Pica, Alessia
    Moeckli, Raphael
    Schorderet, Daniel F.
    Munier, Francis L.
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2011, 95 (02) : 227 - 230