Enabling Training of Neural Networks on Noisy Hardware

被引：18

作者：

Gokmen, Tayfun ^{[1
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2021年 / 4卷

关键词：

learning algorithms; training algorithms; neural network acceleration; Bayesian neural network; inmemory computing; on-chip learning; crossbar arrays; memristor;

D O I：

10.3389/frai.2021.699148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) are typically trained using the conventional stochastic gradient descent (SGD) algorithm. However, SGD performs poorly when applied to train networks on non-ideal analog hardware composed of resistive device arrays with non-symmetric conductance modulation characteristics. Recently we proposed a new algorithm, the Tiki-Taka algorithm, that overcomes this stringent symmetry requirement. Here we build on top of Tiki-Taka and describe a more robust algorithm that further relaxes other stringent hardware requirements. This more robust second version of the Tiki-Taka algorithm (referred to as TTv2) 1. decreases the number of device conductance states requirement from 1000s of states to only 10s of states, 2. increases the noise tolerance to the device conductance modulations by about 100x, and 3. increases the noise tolerance to the matrix-vector multiplication performed by the analog arrays by about 10x. Empirical simulation results show that TTv2 can train various neural networks close to their ideal accuracy even at extremely noisy hardware settings. TTv2 achieves these capabilities by complementing the original Tiki-Taka algorithm with lightweight and low computational complexity digital filtering operations performed outside the analog arrays. Therefore, the implementation cost of TTv2 compared to SGD and Tiki-Taka is minimal, and it maintains the usual power and speed benefits of using analog hardware for training workloads. Here we also show how to extract the neural network from the analog hardware once the training is complete for further model deployment. Similar to Bayesian model averaging, we form analog hardware compatible averages over the neural network weights derived from TTv2 iterates. This model average then can be transferred to another analog or digital hardware with notable improvements in test accuracy, transcending the trained model itself. In short, we describe an end-to-end training and model extraction technique for extremely noisy crossbar-based analog hardware that can be used to accelerate DNN training workloads and match the performance of full-precision SGD.

引用

页数：14

共 50 条

[1] Training neural hardware with noisy components
Rothganger, Fred
Evans, Brian R.
Aimone, James B.
DeBenedictis, Erik P.
[J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[2] NOISY TRAINING FOR DEEP NEURAL NETWORKS
Meng, Xiangtao
Liu, Chao
Zhang, Zhiyong
Wang, Dong
[J]. 2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
[3] Training Neural Networks on Noisy Data
Rusiecki, Andrzej
Kordos, Miroslaw
Kaminski, Tomasz
Gren, Krzysztof
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING ICAISC 2014, PT I, 2014, 8467 : 131 - 142
[4] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015
[5] Noisy training for deep neural networks in speech recognition
Yin, Shi
Liu, Chao
Zhang, Zhiyong
Lin, Yiye
Wang, Dong
Tejedor, Javier
Zheng, Thomas Fang
Li, Yinguo
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
[6] Training Binary Neural Networks through Learning with Noisy Supervision
Han, Kai
Wang, Yunhe
Xu, Yixing
Xu, Chunjing
Wu, Enhua
Xu, Chang
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[7] Enabling Efficient Training of Convolutional Neural Networks for Histopathology Images
Alali, Mohammed H.
Roohi, Arman
Deogun, Jitender S.
[J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022 WORKSHOPS, PT I, 2022, 13373 : 533 - 544
[8] NOISY NEURAL NETWORKS
TAYLOR, JG
[J]. INTERNATIONAL JOURNAL OF NEUROSCIENCE, 1973, 6 (01) : 55 - 55
[9] Training neural networks with noisy data as an ill-posed problem
Martin Burger
Heinz W. Engl
[J]. Advances in Computational Mathematics, 2000, 13 : 335 - 354
[10] Novel algorithms for noisy minimization problems with applications to neural networks training
Sirlantzis, K.
Lamb, J. D.
Liu, W. B.
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2006, 129 (02) : 325 - 340

← 1 2 3 4 5 →