Optimizing FPGA-Based DNN Accelerator With Shared Exponential Floating-Point Format

被引：6

作者：

Zhao, Wenzhe ^{[1
,2
]}

Dang, Qiwei ^{[1
,2
]}

Xia, Tian ^{[1
,2
]}

Zhang, Jingming ^{[1
,2
]}

Zheng, Nanning ^{[1
,2
]}

Ren, Pengju ^{[1
,2
]}

机构：

[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China

[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2023年 / 70卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Deep neural network; accelerator; low-precision floating point; field-programmable gate array (FPGA); very large scale integration circuit (VLSI); PERFORMANCE;

D O I：

10.1109/TCSI.2023.3300657

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, low-precision fixed-point computation has become a widely used technique for neural network inference on FPGAs. However, this approach has some limitations, as certain neural networks are difficult to quantify using fixed-point arithmetic, such as those involved in super-resolution scaling, image denoising, and other scenarios that lack sufficient conditions for fine-tuning. Furthermore, deploying a floating-point precision neural network directly on an FPGA would lead to significant hardware overhead and low computational efficiency. To address this issue, this paper proposes an FPGA-friendly floating-point data format that achieves the same storage density as int8 without sacrificing inference accuracy or requiring fine-tuning. Additionally, this paper presents an FPGA-based neural network accelerator that is compatible with the proposed format, utilizing DSP resources to increase the number of DSP cascading from 7 to 16, and solving the back-to-back accumulation issue of floating-point numbers. This design achieves comparable resource consumption and execution efficiency to those of 8-bit fixed-point accelerators. Experimental results demonstrate that the accelerator proposed in this study achieves the same accuracy as the native floating point on multiple neural networks without fine-tuning, and remains high computing performance. When deployed on the Xilinx ZU9P, the performance achieves 4.072 TFlops at 250 MHz, which outperforms the previous works, including the Xilinx official DPU.

引用

页码：4478 / 4491

页数：14

共 50 条

[31] Parameterisable floating-point operations on FPGA
Lee, B
Burgess, N
THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 1064 - 1068
[32] Floating-point matrix product on FPGA
Bensaali, Faycal
Amira, Abbes
Sotudeh, Reza
2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - +
[33] Low-precision Floating-point Arithmetic for High-performance FPGA-based CNN Acceleration
Wu, Chen
Wang, Mingyu
Chu, Xinyuan
Wang, Kun
He, Lei
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2022, 15 (01)
[34] Floating-Point FPGA: Architecture and Modeling
Ho, Chun Hok
Yu, Chi Wai
Leong, Philip
Luk, Wayne
Wilton, Steven J. E.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (12) : 1709 - 1718
[35] Design of a high-speed FPGA-based 32-bit floating-point FFT processor
Mou, Shengmei
Yang, Xiaodong
SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 84 - +
[36] Parameterized Design and Evaluation of Bandwidth Compressor for Floating-Point Data Streams in FPGA-Based Custom Computing
Ueno, Tomohiro
Kono, Yoshiaki
Sano, Kentaro
Yamamoto, Satoru
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2013, 7806 : 90 - 102
[37] Evaluating Power and Energy Consumption of FPGA-based Custom Computing Machines for Scientific Floating-Point Computation
Sano, Kentaro
Nishikawa, Takeshi
Aoki, Takayuki
Yamamoto, Satoru
PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, 2008, : 301 - +
[38] FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks
Sano, Kentaro
Yamamoto, Satoru
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2823 - 2837
[39] Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing
Ueno, Tomohiro
Sano, Kentaro
Yamamoto, Satoru
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2017, 10 (03)
[40] An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
Zhang, Weichuang
Zhao, Jieru
Shen, Guan
Chen, Quan
Chen, Chen
Guo, Minyi
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 75 - 90

← 1 2 3 4 5 →