Optimal Real Number Codes for Fault Tolerant Matrix Operations

被引:0
|
作者
Chen, Zizhong [1 ]
机构
[1] Colorado Sch Mines, Dept Math & Comp Sci, Golden, CO 80401 USA
来源
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS | 2009年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
It has been demonstrated recently that single fail-stop process failure in ScaLAPACK matrix multiplication can be tolerated without checkpointing. Multiple simultaneous processor failures can be tolerated without checkpointing by encoding matrices using a real-number erasure correcting code. However, the floating-point representation of a real number in today's high performance computer architecture introduces round off errors which can be enlarged and cause the loss of precision of possibly all effective digits during recovery when the number of processors in the system is large. In this paper, we present a class of Reed-Solomon style real-number erasure correcting codes which have optimal numerical stability during recovery. We analytically construct the numerically best erasure correcting codes for 2 erasures and develop an approximation method to computationally construct numerically good codes for 3 or more erasures. Experimental results demonstrate that the proposed codes are numerically much more stable than existing codes.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Heuristics for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems
    Plank, James S.
    Schuman, Catherine D.
    Robison, B. Devin
    2012 42ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2012,
  • [22] WEAVER codes: Highly fault tolerant erasure codes for storage systems
    Hafner, JL
    USENIX ASSOCIATION PROCEEDINGS OF THE 4TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2005, : 211 - 224
  • [23] ON FAULT-TOLERANT MATRIX DECOMPOSITION
    FITZPATRICK, P
    JOURNAL OF VLSI SIGNAL PROCESSING, 1994, 8 (03): : 293 - 303
  • [24] New fault tolerant matrix converter
    Ibarra, Edorta
    Andreu, Jon
    Kortabarria, Inigo
    Ormaetxea, Enekoitz
    Martinez de Alegria, Inigo
    Luis Martin, Jose
    Ibanez, Pedro
    ELECTRIC POWER SYSTEMS RESEARCH, 2011, 81 (02) : 538 - 552
  • [25] Asymptotically optimal lower bounds for the condition number of a real Vandermonde matrix
    Li, Ren-Cang
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2006, 28 (03) : 829 - 844
  • [26] Optimal Synthesis of Fault-Tolerant IDK Cascades for Real-Time Classification
    Baruah, Sanjoy
    Bate, Iain
    Burns, Alan
    Davis, Robert I.
    2024 IEEE 30TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS 2024, 2023, : 29 - 41
  • [27] OPTIMAL RECONFIGURATION ALGORITHMS FOR REAL-TIME FAULT-TOLERANT PROCESSOR ARRAYS
    LIBESKINDHADAS, R
    SHRIVASTAVA, N
    MELHEM, RG
    LIU, CL
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (05) : 498 - 510
  • [28] Fault-Tolerant Gates on Hypergraph Product Codes
    Krishna, Anirudh
    Poulin, David
    PHYSICAL REVIEW X, 2021, 11 (01)
  • [29] Power consumption of fault tolerant codes: The active elements
    Rossi, D
    van Dijk, VES
    Kleihorst, RP
    Nieuwland, AK
    Metra, C
    9TH IEEE INTERNATIONAL ON-LINE TESTING SYMPOSIUM, PROCEEDINGS, 2003, : 61 - 67
  • [30] Reliability levels for fault-tolerant linear processing using real number error correction
    Redinbo, GR
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 1996, 143 (06): : 355 - 363