STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators

被引：0

作者：

Xia, Chengpeng ^{[1
]}

Chen, Yawen ^{[1
]}

Zhang, Haibo ^{[1
]}

Wu, Jigang ^{[2
]}

机构：

[1] Univ Otago, Otago 9016, New Zealand

[2] Guangdong Univ Technol, Guangzhou 510006, Guangdong, Peoples R China

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 05期

关键词：

Stochastic gradient descent; neural networks accelerator; optical; computing; BACKPROPAGATION;

D O I：

10.1145/3607920

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this paper, we propose a novel silicon photonic-based backpropagation accelerator for high performance DNN training. Specifically, a general-purpose photonic gradient descent unit named STADIA is designed to implement the multiplication, accumulation, and subtraction operations required for computing gradients using mature optical devices including Mach-Zehnder Interferometer (MZI) and Mircoring Resonator (MRR), which can significantly reduce the training latency and improve the energy efficiency of backpropagation. To demonstrate efficient parallel computing, we propose a STADIA-based backpropagation acceleration architecture and design a dataflow by using wavelength-division multiplexing (WDM). We analyze the precision of STADIA by quantifying the precision limitations imposed by losses and noises. Furthermore, we evaluate STADIA with different element sizes by analyzing the power, area and time delay for photonic accelerators based on DNN models such as AlexNet, VGG19 and ResNet. Simulation results show that the proposed architecture STADIA can achieve significant improvement by 9.7x in time efficiency and 147.2x in energy efficiency, compared with the most advanced optical-memristor based backpropagation accelerator.

引用

页数：23

共 50 条

[1] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
[J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
[2] Convergence of Stochastic Gradient Descent in Deep Neural Network
Zhou, Bai-cun
Han, Cong-ying
Guo, Tian-de
[J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
[3] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun ZHOU
Cong-ying HAN
Tian-de GUO
[J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
[4] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
[J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
[5] Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer
Yang, Jing
Yang, Guanci
[J]. ALGORITHMS, 2018, 11 (03):
[6] Stochastic gradient descent optimisation for convolutional neural network for medical image segmentation
Nagendram, Sanam
Singh, Arunendra
Harish Babu, Gade
Joshi, Rahul
Pande, Sandeep Dwarkanath
Ahammad, S. K. Hasane
Dhabliya, Dharmesh
Bisht, Aadarsh
[J]. OPEN LIFE SCIENCES, 2023, 18 (01):
[7] Photonic Computing and Communication for Neural Network Accelerators
Xia, Chengpeng
Chen, Yawen
Zhang, Haibo
Zhang, Hao
Wu, Jigang
[J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 121 - 128
[8] Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
Lyu, Bochen
Zhu, Zhanxing
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
Chen, Minyu
[J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 69 - 73
[10] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
You, Zhao
Xu, Bo
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449

← 1 2 3 4 5 →