STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators

被引:0
|
作者
Xia, Chengpeng [1 ]
Chen, Yawen [1 ]
Zhang, Haibo [1 ]
Wu, Jigang [2 ]
机构
[1] Univ Otago, Otago 9016, New Zealand
[2] Guangdong Univ Technol, Guangzhou 510006, Guangdong, Peoples R China
关键词
Stochastic gradient descent; neural networks accelerator; optical; computing; BACKPROPAGATION;
D O I
10.1145/3607920
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this paper, we propose a novel silicon photonic-based backpropagation accelerator for high performance DNN training. Specifically, a general-purpose photonic gradient descent unit named STADIA is designed to implement the multiplication, accumulation, and subtraction operations required for computing gradients using mature optical devices including Mach-Zehnder Interferometer (MZI) and Mircoring Resonator (MRR), which can significantly reduce the training latency and improve the energy efficiency of backpropagation. To demonstrate efficient parallel computing, we propose a STADIA-based backpropagation acceleration architecture and design a dataflow by using wavelength-division multiplexing (WDM). We analyze the precision of STADIA by quantifying the precision limitations imposed by losses and noises. Furthermore, we evaluate STADIA with different element sizes by analyzing the power, area and time delay for photonic accelerators based on DNN models such as AlexNet, VGG19 and ResNet. Simulation results show that the proposed architecture STADIA can achieve significant improvement by 9.7x in time efficiency and 147.2x in energy efficiency, compared with the most advanced optical-memristor based backpropagation accelerator.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun Zhou
    Cong-ying Han
    Tian-de Guo
    [J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
  • [2] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
  • [3] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun ZHOU
    Cong-ying HAN
    Tian-de GUO
    [J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
  • [4] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    [J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [5] Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer
    Yang, Jing
    Yang, Guanci
    [J]. ALGORITHMS, 2018, 11 (03):
  • [6] Stochastic gradient descent optimisation for convolutional neural network for medical image segmentation
    Nagendram, Sanam
    Singh, Arunendra
    Harish Babu, Gade
    Joshi, Rahul
    Pande, Sandeep Dwarkanath
    Ahammad, S. K. Hasane
    Dhabliya, Dharmesh
    Bisht, Aadarsh
    [J]. OPEN LIFE SCIENCES, 2023, 18 (01):
  • [7] Photonic Computing and Communication for Neural Network Accelerators
    Xia, Chengpeng
    Chen, Yawen
    Zhang, Haibo
    Zhang, Hao
    Wu, Jigang
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 121 - 128
  • [8] Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
    Lyu, Bochen
    Zhu, Zhanxing
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
    Chen, Minyu
    [J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 69 - 73
  • [10] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
    You, Zhao
    Xu, Bo
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449