FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators

被引:0
|
作者
Dhingra, Pratyush [1 ]
Ogbogu, Chukwufumnanya [1 ]
Joardar, Biresh Kumar [2 ]
Doppa, Janardhan Rao [1 ]
Kalyanaraman, Ananth [1 ]
Pande, Partha Pratim [1 ]
机构
[1] Washington State Univ, Pullman, WA 99164 USA
[2] Univ Houston, Houston, TX USA
基金
美国国家科学基金会;
关键词
ReRAM; PIM; Fault-Tolerant Training; GNNs;
D O I
10.23919/DATE58400.2024.10546762
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Resistive random-access memory (ReRAM)based processing-in-memory (PIM) architecture is an attractive solution for training Graph Neural Networks (GNNs) on edge platforms. However, the immature fabrication process and limited write endurance of ReRAMs make them prone to hardware faults, thereby limiting their widespread adoption for GNN training. Further, the existing fault-tolerant solutions prove inadequate for effectively training GNNs in the presence of faults. In this paper, we propose a fault-aware framework referred to as FARe that mitigates the effect of faults during GNN training. FARe outperforms existing approaches in terms of both accuracy and timing overhead. Experimental results demonstrate that FARe framework can restore GNN test accuracy by 47.6% on faulty ReRAM hardware with a similar to 1% timing overhead compared to the fault-free counterpart.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] PRAP-PIM: A weight pattern reusing aware pruning method for ReRAM-based PIM DNN accelerators
    Shen, Zhaoyan
    Wu, Jinhao
    Jiang, Xikun
    Zhang, Yuhao
    Ju, Lei
    Jia, Zhiping
    HIGH-CONFIDENCE COMPUTING, 2023, 3 (02):
  • [2] Training-Free Stuck-At Fault Mitigation for ReRAM-Based Deep Learning Accelerators
    Quan, Chenghao
    Fouda, Mohammed E.
    Lee, Sugil
    Jung, Giju
    Lee, Jongeun
    Eltawil, Ahmed E.
    Kurdahi, Fadi
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (07) : 2174 - 2186
  • [3] Hardware attacks on ReRAM-based AI accelerators
    Heidary, Masoud
    Joardar, Biresh Kumar
    17TH IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE, DCAS 2024, 2024,
  • [4] A Quantized Training Framework for Robust and Accurate ReRAM-based Neural Network Accelerators
    Zhang, Chenguang
    Zhou, Pingqiang
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 43 - 48
  • [5] Optimizing Motion Estimation with an ReRAM-Based PIM Architecture
    Liu, Bing
    Shen, Zhaoyan
    Jia, Zhiping
    Cai, Xiaojun
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PT I, 2020, 12384 : 285 - 297
  • [6] An Empirical Fault Vulnerability Exploration of ReRAM-Based Process-in-Memory CNN Accelerators
    Dorostkar, Aniseh
    Farbeh, Hamed
    Zarandi, Hamid R.
    IEEE TRANSACTIONS ON RELIABILITY, 2024, : 1 - 15
  • [7] ReHy: A ReRAM-Based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training
    Jin, Hai
    Liu, Cong
    Liu, Haikun
    Luo, Ruikun
    Xu, Jiahong
    Mao, Fubing
    Liao, Xiaofei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2872 - 2884
  • [8] APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators
    Yang, Siling
    He, Shuibing
    Duan, Hexiao
    Chen, Weijian
    Zhang, Xuechen
    Wu, Tong
    Yin, Yanlong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2498 - 2511
  • [9] Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators
    Huang, Sitao
    Ankit, Aayush
    Silveira, Plinio
    Antunes, Rodrigo
    Chalamalasetti, Sai Rahul
    El Hajj, Izzat
    Kim, Dong Eun
    Aguiar, Glaucimar
    Bruel, Pedro
    Serebryakov, Sergey
    Xu, Cong
    Li, Can
    Faraboschi, Paolo
    Strachan, John Paul
    Chen, Deming
    Roy, Kaushik
    Hwu, Wen-mei
    Milojicic, Dejan
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 372 - 377
  • [10] Fault-Free: A Framework for Analysis and Mitigation of Stuck-at-Fault on Realistic ReRAM-Based DNN Accelerators
    Shin, Hyein
    Kang, Myeonggu
    Kim, Lee-Sup
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (07) : 2011 - 2024