A Case for Soft Error Detection and Correction in Computational Chemistry

被引:7
|
作者
van Dam, Hubertus J. J. [1 ]
Vishnu, Abhinav [1 ]
de Jong, Wibe A. [1 ]
机构
[1] Pacific NW Natl Lab, Richland, WA 99354 USA
关键词
D O I
10.1021/ct400489c
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
High performance computing platforms are expected to deliver 1018 floating operations per second by the year 2022 through the deployment of millions of cores. Even if every core is highly reliable the sheer number of them will mean that the mean time between failures will become so short that most application runs will suffer at least one fault. In particular soft errors caused by intermittent incorrect behavior of the hardware are a concern as they lead to silent data corruption. In this paper we investigate the impact of soft errors on optimization algorithms using Hartree-Fock as a particular example. Optimization algorithms iteratively reduce the error in the initial guess to reach the intended solution. Therefore they may intuitively appear to be resilient to soft errors. Our results show that this is true for soft errors of small magnitudes but not for large errors. We suggest error detection and correction mechanisms for different classes of data structures. The results obtained with these mechanisms indicate that we can correct more than 95% of the soft errors at moderate increases in the computational cost.
引用
收藏
页码:3995 / 4005
页数:11
相关论文
共 50 条
  • [31] Iterative Error Correction with Double/Triple Error Detection
    Pfeifer, Petr
    Vierhaus, H. T.
    2016 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2016, : 14 - 19
  • [32] Early Execution for Soft Error Detection
    Choudhary, Raj Kumar
    Patel, Janeel
    Singh, Virendra
    PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 366 - 371
  • [33] Which Verification for Soft Error Detection?
    Bautista-Gomez, Leonardo
    Benoit, Anne
    Cavelan, Aurelien
    Raina, Saurabh K.
    Robert, Yves
    Sun, Hongyang
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 2 - 11
  • [34] Exploiting Error Detection Latency for Parity-based Soft Error Detection
    Aydos, Gokce
    Fey, Goerschwin
    2016 IEEE 19TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS (DDECS), 2016, : 3 - 8
  • [35] An intuitive treatment of error detection and correction
    Sklar, Bernard
    Harris, Fredric J.
    IEEE Signal Processing Magazine, 2004, 21 (04) : 14 - 35
  • [36] SPECIFYING ERROR CORRECTION AND DETECTION FOR A DVTR
    WATNEY, JP
    SMPTE JOURNAL, 1986, 95 (04): : 488 - 488
  • [37] Arabic spelling error detection and correction
    Attia, Mohammed
    Pecina, Pavel
    Samih, Younes
    Shaalan, Khaled
    Van Genabith, Josef
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (05) : 751 - 773
  • [38] Error Detection and Correction in Communication Networks
    Shangguan, Chong
    Tamo, Itzhak
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 96 - 101
  • [39] ERROR-DETECTION AND CORRECTION IN SPELLING
    LYDIATT, S
    ACADEMIC THERAPY, 1984, 20 (01): : 33 - 40
  • [40] ERROR CORRECTION AND DETECTION, A GEOMETRIC APPROACH
    WARD, RK
    TABANDEH, M
    COMPUTER JOURNAL, 1984, 27 (03): : 246 - 253