FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators

被引:0
|
作者
Dhingra, Pratyush [1 ]
Ogbogu, Chukwufumnanya [1 ]
Joardar, Biresh Kumar [2 ]
Doppa, Janardhan Rao [1 ]
Kalyanaraman, Ananth [1 ]
Pande, Partha Pratim [1 ]
机构
[1] Washington State Univ, Pullman, WA 99164 USA
[2] Univ Houston, Houston, TX USA
基金
美国国家科学基金会;
关键词
ReRAM; PIM; Fault-Tolerant Training; GNNs;
D O I
10.23919/DATE58400.2024.10546762
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Resistive random-access memory (ReRAM)based processing-in-memory (PIM) architecture is an attractive solution for training Graph Neural Networks (GNNs) on edge platforms. However, the immature fabrication process and limited write endurance of ReRAMs make them prone to hardware faults, thereby limiting their widespread adoption for GNN training. Further, the existing fault-tolerant solutions prove inadequate for effectively training GNNs in the presence of faults. In this paper, we propose a fault-aware framework referred to as FARe that mitigates the effect of faults during GNN training. FARe outperforms existing approaches in terms of both accuracy and timing overhead. Experimental results demonstrate that FARe framework can restore GNN test accuracy by 47.6% on faulty ReRAM hardware with a similar to 1% timing overhead compared to the fault-free counterpart.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators
    Wang, Weijia
    Lin, Bill
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (02)
  • [22] Online Fault Detection in ReRAM-Based Computing Systems for Inferencing
    Liu, Mengyun
    Chakrabarty, Krishnendu
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (04) : 392 - 405
  • [23] On-Line Fault Protection for ReRAM-Based Neural Networks
    Li, Wen
    Wang, Ying
    Liu, Cheng
    He, Yintao
    Liu, Lian
    Li, Huawei
    Li, Xiaowei
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (02) : 423 - 437
  • [24] REC: REtime Convolutional layers in energy harvesting ReRAM-based CNN accelerators
    Zhou, Kunyu
    Qiu, Keni
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2022 (CF 2022), 2022, : 185 - 188
  • [25] Model-based programming of fault-aware systems
    Williams, BC
    Ingham, MD
    Chung, S
    Elliott, P
    Hofbaur, M
    Sullivan, GT
    AI MAGAZINE, 2003, 24 (04) : 61 - 75
  • [26] Thermal-aware Optimizations of ReRAM-based Neuromorphic Computing Systems
    Beigi, Majed Valad
    Memik, Gokhan
    2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
  • [27] Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators
    Azamat, Azat
    Asim, Faaiz
    Lee, Jongeun
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [28] ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators
    Xu, Jiahong
    Li, Haikun
    Duan, Zhuohui
    Liao, Xiaofei
    Jin, Hai
    Yang, Xiaokang
    Li, Huize
    Liu, Cong
    Mao, Fubing
    Zhang, Yu
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
  • [29] On Minimizing Analog Variation Errors to Resolve the Scalability Issue of ReRAM-Based Crossbar Accelerators
    Kang, Yao-Wen
    Wu, Chun-Feng
    Chang, Yuan-Hao
    Kuo, Tei-Wei
    Ho, Shu-Yin
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) : 3856 - 3867
  • [30] Partial Sum Quantization for Reducing ADC Size in ReRAM-Based Neural Network Accelerators
    Azamat, Azat
    Asim, Faaiz
    Kim, Jintae
    Lee, Jongeun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4897 - 4908