High Performance Training of Deep Neural Networks Using Pipelined Hardware Acceleration and Distributed Memory

被引:0
|
作者
Mehta, Ragav [1 ]
Huang, Yuyang [2 ]
Cheng, Mingxi [3 ]
Bagga, Shrey [4 ]
Mathur, Nishant [4 ]
Li, Ji [4 ]
Draper, Jeffrey [4 ]
Nazarian, Shahin [4 ]
机构
[1] Mentor, Wilsonville, OR USA
[2] Nvidia, Shanghai, Peoples R China
[3] Duke Univ, Durham, NC USA
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
Deep learning; neural network; hardware design; MACHINE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.
引用
收藏
页码:383 / 388
页数:6
相关论文
共 50 条
  • [1] Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
    Zhang, Lifu
    Abdelrahman, Tarek S.
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2021, 2021
  • [2] Accelerating distributed deep neural network training with pipelined MPI allreduce
    Castello, Adrian
    Quintana-Orti, Enrique S.
    Duato, Jose
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04): : 3797 - 3813
  • [3] Accelerating distributed deep neural network training with pipelined MPI allreduce
    Adrián Castelló
    Enrique S. Quintana-Ortí
    José Duato
    [J]. Cluster Computing, 2021, 24 : 3797 - 3813
  • [4] FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision
    Imani, Mohsen
    Gupta, Saransh
    Kim, Yeseong
    Rosing, Tajana
    [J]. PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 802 - 815
  • [5] A novel framework to enhance the performance of training distributed deep neural networks
    Phan, Trung
    Do, Phuc
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 753 - 768
  • [6] A High-Performance Pixel-Level Fully Pipelined Hardware Accelerator for Neural Networks
    Li, Zhan
    Zhang, Zhihan
    Hu, Jie
    Meng, Qunkang
    Shi, Xingyu
    Luo, Jun
    Wang, Hao
    Huang, Qijun
    Chang, Sheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [7] A Pipelined Energy-efficient Hardware Accelaration for Deep Convolutional Neural Networks
    Alaeddine, Hmidi
    Jihene, Malek
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DESIGN & TEST OF INTEGRATED MICRO & NANO-SYSTEMS (DTS), 2019,
  • [8] Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip
    Zhang Meng
    Zhang Jingwei
    Li Guoqing
    Wu Ruixia
    Zeng Xiaoyang
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (06) : 1510 - 1517
  • [9] Efficient Hardware Acceleration for Approximate Inference of Bitwise Deep Neural Networks
    Vogel, Sebastian
    Guntoro, Andre
    Ascheid, Gerd
    [J]. 2017 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING (DASIP), 2017,
  • [10] Soft Memory Box: A Virtual Shared Memory Framework for Fast Deep Neural Network Training in Distributed High Performance Computing
    Ahn, Shinyoung
    Kim, Joongheon
    Lim, Eunji
    Kang, Sungwon
    [J]. IEEE ACCESS, 2018, 6 : 26493 - 26504