Optimizing FPGA-Based DNN Accelerator With Shared Exponential Floating-Point Format

被引:6
|
作者
Zhao, Wenzhe [1 ,2 ]
Dang, Qiwei [1 ,2 ]
Xia, Tian [1 ,2 ]
Zhang, Jingming [1 ,2 ]
Zheng, Nanning [1 ,2 ]
Ren, Pengju [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; accelerator; low-precision floating point; field-programmable gate array (FPGA); very large scale integration circuit (VLSI); PERFORMANCE;
D O I
10.1109/TCSI.2023.3300657
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, low-precision fixed-point computation has become a widely used technique for neural network inference on FPGAs. However, this approach has some limitations, as certain neural networks are difficult to quantify using fixed-point arithmetic, such as those involved in super-resolution scaling, image denoising, and other scenarios that lack sufficient conditions for fine-tuning. Furthermore, deploying a floating-point precision neural network directly on an FPGA would lead to significant hardware overhead and low computational efficiency. To address this issue, this paper proposes an FPGA-friendly floating-point data format that achieves the same storage density as int8 without sacrificing inference accuracy or requiring fine-tuning. Additionally, this paper presents an FPGA-based neural network accelerator that is compatible with the proposed format, utilizing DSP resources to increase the number of DSP cascading from 7 to 16, and solving the back-to-back accumulation issue of floating-point numbers. This design achieves comparable resource consumption and execution efficiency to those of 8-bit fixed-point accelerators. Experimental results demonstrate that the accelerator proposed in this study achieves the same accuracy as the native floating point on multiple neural networks without fine-tuning, and remains high computing performance. When deployed on the Xilinx ZU9P, the performance achieves 4.072 TFlops at 250 MHz, which outperforms the previous works, including the Xilinx official DPU.
引用
收藏
页码:4478 / 4491
页数:14
相关论文
共 50 条
  • [21] Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC
    CHEN Jiyang
    LEI Yuanwu
    PENG Yuanxi
    HE Tingting
    DENG Ziye
    ChineseJournalofElectronics, 2016, 25 (06) : 1063 - 1070
  • [22] Configurable Floating-Point FFT Accelerator on FPGA Based Multiple-Rotation CORDIC
    Chen Jiyang
    Lei Yuanwu
    Peng Yuanxi
    He Tingting
    Deng Ziye
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (06) : 1063 - 1070
  • [23] An FPGA-based low-cost VLIW floating-point processor for CNC applications
    Dong, Jingchuan
    Wang, Taiyong
    Li, Bo
    Liu, Zhe
    Yu, Zhigiang
    MICROPROCESSORS AND MICROSYSTEMS, 2017, 50 : 14 - 25
  • [24] VECTOR FLOATING-POINT DATA FORMAT
    HIGBIE, LC
    IEEE TRANSACTIONS ON COMPUTERS, 1976, 25 (01) : 25 - 32
  • [25] An FPGA-based floating-point processor array supporting a high-precision dot product
    Mayer-Lindenberg, Fritz
    Beller, Valerij
    2006 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2006, : 317 - +
  • [26] A new floating-point adder FPGA-based implementation using RN-coding of numbers
    Araujo, Tulio
    Cardoso, Matheus B. R.
    Nepomuceno, Erivelton G.
    Llanos, Carlos H.
    Arias-Garcia, Janier
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
  • [27] FPGA-based DNN Hardware Accelerator for Sensor Network Aggregation Node
    Mohamed, Nadya A.
    Cavallaro, Joseph R.
    2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 322 - 327
  • [28] An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training
    Shao, Haikuo
    Lu, Jinming
    Lin, Jun
    Wang, Zhongfeng
    2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 254 - 259
  • [29] Feasibility of floating-point arithmetic in FPGA based ANNs
    Nichols, KR
    Moussa, MA
    Areibi, SM
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2002, : 8 - 13
  • [30] High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
    Lian, Xiaocong
    Liu, Zhenyu
    Song, Zhourui
    Dai, Jiwu
    Zhou, Wei
    Ji, Xiangyang
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (08) : 1874 - 1885