Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method

被引:0
|
作者
Felix Loh
Kewal K. Saluja
Parameswaran Ramanathan
机构
[1] University of Wisconsin-Madison,
来源
关键词
Fault tolerance; Invariant checking; Lanczos method; Sparse linear algebra; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
An extensive survey of the literature shows that the Lanczos eigensolver is a popular iterative method for approximating a few maximal eigenvalues of a real symmetric matrix, particularly if the matrix is large and sparse. In recent years, graphics processing units (GPUs) have become a popular platform for scientific computing applications, many of which are based on linear algebra, and are increasingly being used as the main computational units in supercomputers. This trend is expected to continue as the number of computations required by scientific applications reach petascale and exascale range. In this paper, building on our earlier work [22], we investigate in detail the error checking mechanism for the Lanczos eigensolver. We identify a low cost invariant for efficient error checking, and through mathematical analysis determine the efficiency of our mechanism when used by the Lanczos eigensolver. We evaluate the proposed fault tolerant scheme using an open-source sparse eigensolver on a GPU platform, with and without the injection of faults. We use a large number of sparse matrices from real applications, to determine the efficiency and efficacy of our method and our implementation shows that the proposed fault tolerant method has good error coverage and low overhead. To the best of our knowledge, we are the first to introduce such a scheme for the Lanczos method.
引用
收藏
页码:409 / 422
页数:13
相关论文
共 50 条
  • [1] Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method
    Loh, Felix
    Saluja, Kewal K.
    Ramanathan, Parameswaran
    JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS, 2021, 37 (03): : 409 - 422
  • [2] Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method
    Loh, Felix
    Saluja, Kewal K.
    Ramanathan, Parameswaran
    Journal of Electronic Testing: Theory and Applications (JETTA), 2021, 37 (03): : 409 - 422
  • [3] Fault Tolerance through Invariant Checking for the Lanczos Eigensolver
    Loh, Felix
    Saluja, Kewal K.
    Ramanathan, Parameswaran
    2020 33RD INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2020 19TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2020, : 13 - 18
  • [4] FAULT-TOLERANT PROPERTIES AND A FAULT-CHECKING METHOD OF FUZZY CONTROL
    ITO, H
    MATSUBARA, T
    KUROKAWA, T
    KOGA, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (05) : 586 - 593
  • [5] Model checking fault tolerant systems
    Bernardeschi, C
    Fantechi, A
    Gnesi, S
    SOFTWARE TESTING VERIFICATION & RELIABILITY, 2002, 12 (04): : 251 - 275
  • [6] Self Checking and Fault Tolerant Digital Design
    Rajasree, Y.
    Priya, Y. Vishnu
    Alamelu, N. R.
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON APPLICATIONS OF ELECTRICAL ENGINEERING/8TH INTERNATIONAL CONFERENCE ON APPLIED ELECTROMAGNETICS, WIRELESS AND OPTICAL COMMUNICATIONS, 2009, : 86 - 92
  • [7] ALGORITHMIC FAULT TOLERANCE USING THE LANCZOS METHOD
    BOLEY, DL
    BRENT, RP
    GOLUB, GH
    LUK, FT
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1992, 13 (01) : 312 - 332
  • [8] Fault Tolerance through Invariant Checking for Iterative Solvers
    Loh, Felix
    Saluja, Kewal K.
    Ramanathan, Parameswaran
    2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 481 - 486
  • [9] Efficient Model Checking of Fault-Tolerant Distributed Protocols
    Bokor, Peter
    Kinder, Johannes
    Serafini, Marco
    Suri, Neeraj
    2011 IEEE/IFIP 41ST INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2011, : 73 - 84
  • [10] Validating requirements for fault tolerant systems using model checking
    Schneider, F
    Easterbrook, SM
    Callahan, JR
    Holzmann, GJ
    THIRD INTERNATIONAL CONFERENCE ON REQUIREMENTS ENGINEERING - PROCEEDINGS, 1998, : 4 - 13