Error Resilient Transformers: A Novel Soft Error Vulnerability Guided Approach to Error Checking and Suppression

被引:0
|
作者
Ma, Kwondo [1 ]
Amarnath, Chandramouli [1 ]
Chatterjee, Abhijit [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Transformer; error resilience;
D O I
10.1109/ETS56758.2023.10174239
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A novel unequal error protection approach for error resilient video transmission
    Fang, T
    Chau, LP
    [J]. 2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 4022 - 4025
  • [2] Soft error resilient system design through error correction
    Mitra, Subhasish
    Zhang, Ming
    Seifert, Norbert
    Mak, T. M.
    Kim, Kee Sup
    [J]. VLSI-SOC: RESEARCH TRENDS IN VLSI AND SYSTEMS ON CHIP, 2008, : 143 - +
  • [3] Soft error resilient system design through error correction
    Mitra, Subhasish
    Zhang, Ming
    Seifert, Norbert
    Mak, T. M.
    Kim, Kee Sup
    [J]. IFIP VLSI-SOC 2006: IFIP WG 10.5 INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION & SYSTEM-ON-CHIP, 2006, : 332 - +
  • [4] Formal equivalence checking guided soft error vulnerable spots selection
    Zhu, Dan
    Li, Tun
    Li, Sikun
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2011, 23 (03): : 465 - 470
  • [5] Regional soft error vulnerability and error propagation analysis for GPGPU applications
    Oz, Isil
    Karadas, Omer Faruk
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 4095 - 4130
  • [6] Regional soft error vulnerability and error propagation analysis for GPGPU applications
    Işıl Öz
    Ömer Faruk Karadaş
    [J]. The Journal of Supercomputing, 2022, 78 : 4095 - 4130
  • [7] Soft Error Reliability Analysis of Vision Transformers
    Xue, Xinghua
    Liu, Cheng
    Wang, Ying
    Yang, Bing
    Luo, Tao
    Zhang, Lei
    Li, Huawei
    Li, Xiaowei
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (12) : 2126 - 2136
  • [8] Error checking
    Nisley, Ed
    [J]. DR DOBBS JOURNAL, 2006, 31 (11): : 72 - +
  • [9] Universal Rules Guided Design Parameter Selection for Soft Error Resilient Processors
    Duan, Lide
    Zhang, Ying
    Li, Bin
    Peng, Lu
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2011), 2011, : 247 - 256
  • [10] SERL: Soft Error Resilient Latch Design
    Chang, Chun-Wei
    Huang, Hsuan-Ming
    Lin, Yuwen
    Wen, Charles H. -P.
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2016,