Making the Fault-Tolerance of Emerging Neural Network Accelerators Scalable

被引:1
|
作者
Liu, Tao [1 ]
Wen, Wujie [2 ]
机构
[1] Florida Int Univ, Miami, FL 33199 USA
[2] Lehigh Univ, Bethlehem, PA 18015 USA
关键词
DESIGN;
D O I
10.1109/iccad45719.2019.8942073
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural network (DNN) accelerators built upon emerging technologies, such as memristor, are gaining increasing research attention because of the impressive computing efficiency brought by processing-in-memory. One critical challenge faced by these promising accelerators, however, is their poor reliability: the weight, which is stored as the memristance or resistance value of each device, suffers large uncertainty incurred by unique device physical limitations, e.g. stochastic programming, resistance drift etc., translating into prominent testing accuracy degradation. Non-trivial retraining, weight remapping or redundant cell fixing, are popular approaches to address this issue. However, these solutions have limited scalability since they are more like tedious patch adding or bug fixing after identifying each accelerator-dependent defect map. On the other side, scalable solutions are highly desirable in the envisioned scenario of a neural network trained once in the cloud and deployed to many edge devices with each equipped with an emerging accelerator. In this paper, we discuss the challenge and requirement of the fault-tolerance in these new accelerators. Then we show how to address this problem through a scalable algorithm-hardware co-design method, with a focus on unleashing the algorithmic error-resilience of DNN classifiers, so as to eliminate any expensive defect-map-specific calibration or training-from-scratch.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Drop-Connect as a Fault-Tolerance Approach for RRAM-based Deep Neural Network Accelerators
    Xiang, Mingyuan
    Xie, Xuhan
    Savarese, Pedro
    Yuan, Xin
    Maire, Michael
    Li, Yanjing
    [J]. 2024 IEEE 42ND VLSI TEST SYMPOSIUM, VTS 2024, 2024,
  • [2] Preliminary Experiments on Fault-Tolerance of a Small Convolutional Neural Network
    Kaneko, Haruhiko
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2018,
  • [3] An Approximate Fault-Tolerance Design for a Convolutional Neural Network Accelerator
    Wei, Wenda
    Wang, Chenyang
    Zheng, Xinyang
    Yue, Hengshan
    [J]. IT PROFESSIONAL, 2023, 25 (04) : 85 - 90
  • [4] NEURAL NETWORK REALIZATION OF MARKOV RELIABILITY AND FAULT-TOLERANCE MODELS
    SULIMAN, M
    MANZOUL, MA
    [J]. MICROELECTRONICS AND RELIABILITY, 1991, 31 (01): : 141 - 147
  • [5] Fault-tolerance capabilities of a software-implemented Hopfield Neural Network
    Mansour, Wassim
    Velazco, Raoul
    Ayoubi, Rafic
    El Falou, Wassim
    Ziade, Haissam
    [J]. 2013 THIRD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND INFORMATION TECHNOLOGY (ICCIT), 2013, : 205 - 208
  • [6] Fault-tolerance analysis of neural network for high voltage transmission line fault diagnosis
    Jiang, HL
    Sun, YM
    [J]. FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN POWER SYSTEM CONTROL, OPERATION & MANAGEMENT, VOLS 1 AND 2, 1997, : 433 - 438
  • [7] Load Balancing and Fault-Tolerance for Scalable Network File Systems Using by Web Services
    Chang, Hsien-Tsung
    [J]. PROCEEDINGS OF THE 13TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS, 2009, : 351 - +
  • [8] PERFORMANCE AND FAULT-TOLERANCE OF NEURAL NETWORKS FOR OPTIMIZATION
    PROTZEL, PW
    PALUMBO, DL
    ARRAS, MK
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (04): : 600 - 614
  • [9] NETWORK RESILIENCE - A MEASURE OF NETWORK FAULT-TOLERANCE - COMMENT
    HWANG, FK
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (12) : 1451 - 1452
  • [10] NETWORK RESILIENCE - A MEASURE OF NETWORK FAULT-TOLERANCE - REPLY
    NAJJAR, W
    GAUDIOT, JL
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (12) : 1452 - 1453