Hardware-Aware Softmax Approximation for Deep Neural Networks

被引：13

作者：

Geng, Xue ^{[1
]}

Lin, Jie ^{[1
]}

Zhao, Bin ^{[2
]}

Kong, Anmin ^{[2
]}

Aly, Mohamed M. Sabry ^{[3
]}

Chandrasekhar, Vijay ^{[1
]}

机构：

[1] ASTAR, I2R, Singapore, Singapore

[2] ASTAR, IME, Singapore, Singapore

[3] Nanyang Technol Univ, Sch CSE, Singapore, Singapore

来源：

COMPUTER VISION - ACCV 2018, PT IV | 2019年 / 11364卷

关键词：

Softmax; Nonlinear operation; Power; Area;

D O I：

10.1007/978-3-030-20870-7_7

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application accuracy. Recent efforts mainly focused on linear functions (matrix multiplication) in convolutional (Conv) or fully connected (FC) layers, while there is no publicly available study on optimizing the inference of non-linear functions in DNNs, with hardware constraints. In this paper, we address the problem of cost-efficient inference for Softmax, a popular non-linear function in DNNs. We introduce a hardware-aware linear approximation framework by algorithm and hardware co-optimization, with the goal of minimizing the cost in terms of area and energy, without incurring significant loss in application accuracy. This is achieved by simultaneously reducing the operand bit-width and approximating cost-intensive operations in Softmax (e.g. exponential and division) with cost-effective operations (e.g. addition and bit shifts). We designed and synthesized a hardware unit for our approximation approach, to estimate the area and energy consumption. In addition, we introduce a training method to further save area and energy cost, by reduced precision. Our approach reduces area cost by 13x and energy consumption by 2x with 11-bit operand width, compared to baseline at 19-bit for VOC2007 dataset in Faster R-CNN.

引用

页码：107 / 122

页数：16

共 50 条

[1] Quantized rewiring: hardware-aware training of sparse deep neural networks
Petschenig, Horst
Legenstein, Robert
[J]. NEUROMORPHIC COMPUTING AND ENGINEERING, 2023, 3 (02):
[2] HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration
Yu, Fang
Han, Chuanqi
Wang, Pengcheng
Huang, Ruoran
Huang, Xi
Cui, Li
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 255 - 262
[3] Hardware-aware approach to deep neural network optimization
Li, Hengyi
Meng, Lin
[J]. NEUROCOMPUTING, 2023, 559
[4] Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training
Hansson, Olle
Grailoo, Mahdieh
Gustafsson, Oscar
Nunez-Yanez, Jose
[J]. APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2024, 2024, 14553 : 33 - 47
[5] Efficient Keyword Spotting through Hardware-Aware Conditional Execution of Deep Neural Networks
Giraldo, J. S. P.
O'Connor, Chris
Verhelst, Marian
[J]. 2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
[6] Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training
Hansson, Olle
Grailoo, Mahdieh
Gustafsson, Oscar
Nunez-Yanez, Jose
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2024, 14553 LNCS : 33 - 47
[7] Efficient Softmax Hardware Architecture for Deep Neural Networks
Du, Gaoming
Tian, Chao
Li, Zhenmin
Zhang, Duoli
Yin, Yongsheng
Ouyang, Yiming
[J]. GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 75 - 80
[8] Hardware-aware Model Architecture for Ternary Spiking Neural Networks
Wu, Nai-Chun
Chen, Tsu-Hsiang
Huang, Chih-Tsun
[J]. 2023 INTERNATIONAL VLSI SYMPOSIUM ON TECHNOLOGY, SYSTEMS AND APPLICATIONS, VLSI-TSA/VLSI-DAT, 2023,
[9] Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks
Heidorn, Christian
Sabih, Muhammad
Meyerhoefer, Nicolai
Schinabeck, Christian
Teich, Juergen
Hannig, Frank
[J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2024, 52 (1-2) : 40 - 58
[10] A Study on Hardware-Aware Training Techniques for Feedforward Artificial Neural Networks
Parvin, Sajjad
Altun, Mustafa
[J]. 2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 406 - 411

← 1 2 3 4 5 →