High Performance Training of Deep Neural Networks Using Pipelined Hardware Acceleration and Distributed Memory

被引：0

作者：

Mehta, Ragav ^{[1
]}

Huang, Yuyang ^{[2
]}

Cheng, Mingxi ^{[3
]}

Bagga, Shrey ^{[4
]}

Mathur, Nishant ^{[4
]}

Li, Ji ^{[4
]}

Draper, Jeffrey ^{[4
]}

Nazarian, Shahin ^{[4
]}

机构：

[1] Mentor, Wilsonville, OR USA

[2] Nvidia, Shanghai, Peoples R China

[3] Duke Univ, Durham, NC USA

[4] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

2018 19TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED) | 2018年

关键词：

Deep learning; neural network; hardware design; MACHINE;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.

引用

页码：383 / 388

页数：6

共 50 条

[1] Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
Zhang, Lifu
Abdelrahman, Tarek S.
[J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2021, 2021
[2] Accelerating distributed deep neural network training with pipelined MPI allreduce
Castello, Adrian
Quintana-Orti, Enrique S.
Duato, Jose
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (04): : 3797 - 3813
[3] Accelerating distributed deep neural network training with pipelined MPI allreduce
Adrián Castelló
Enrique S. Quintana-Ortí
José Duato
[J]. Cluster Computing, 2021, 24 : 3797 - 3813
[4] FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision
Imani, Mohsen
Gupta, Saransh
Kim, Yeseong
Rosing, Tajana
[J]. PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 802 - 815
[5] A novel framework to enhance the performance of training distributed deep neural networks
Phan, Trung
Do, Phuc
[J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 753 - 768
[6] A High-Performance Pixel-Level Fully Pipelined Hardware Accelerator for Neural Networks
Li, Zhan
Zhang, Zhihan
Hu, Jie
Meng, Qunkang
Shi, Xingyu
Luo, Jun
Wang, Hao
Huang, Qijun
Chang, Sheng
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[7] A Pipelined Energy-efficient Hardware Accelaration for Deep Convolutional Neural Networks
Alaeddine, Hmidi
Jihene, Malek
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DESIGN & TEST OF INTEGRATED MICRO & NANO-SYSTEMS (DTS), 2019,
[8] Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip
Zhang Meng
Zhang Jingwei
Li Guoqing
Wu Ruixia
Zeng Xiaoyang
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (06) : 1510 - 1517
[9] Efficient Hardware Acceleration for Approximate Inference of Bitwise Deep Neural Networks
Vogel, Sebastian
Guntoro, Andre
Ascheid, Gerd
[J]. 2017 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING (DASIP), 2017,
[10] Soft Memory Box: A Virtual Shared Memory Framework for Fast Deep Neural Network Training in Distributed High Performance Computing
Ahn, Shinyoung
Kim, Joongheon
Lim, Eunji
Kang, Sungwon
[J]. IEEE ACCESS, 2018, 6 : 26493 - 26504

← 1 2 3 4 5 →