Optimizing FPGA-Based DNN Accelerator With Shared Exponential Floating-Point Format

被引:6
|
作者
Zhao, Wenzhe [1 ,2 ]
Dang, Qiwei [1 ,2 ]
Xia, Tian [1 ,2 ]
Zhang, Jingming [1 ,2 ]
Zheng, Nanning [1 ,2 ]
Ren, Pengju [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; accelerator; low-precision floating point; field-programmable gate array (FPGA); very large scale integration circuit (VLSI); PERFORMANCE;
D O I
10.1109/TCSI.2023.3300657
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, low-precision fixed-point computation has become a widely used technique for neural network inference on FPGAs. However, this approach has some limitations, as certain neural networks are difficult to quantify using fixed-point arithmetic, such as those involved in super-resolution scaling, image denoising, and other scenarios that lack sufficient conditions for fine-tuning. Furthermore, deploying a floating-point precision neural network directly on an FPGA would lead to significant hardware overhead and low computational efficiency. To address this issue, this paper proposes an FPGA-friendly floating-point data format that achieves the same storage density as int8 without sacrificing inference accuracy or requiring fine-tuning. Additionally, this paper presents an FPGA-based neural network accelerator that is compatible with the proposed format, utilizing DSP resources to increase the number of DSP cascading from 7 to 16, and solving the back-to-back accumulation issue of floating-point numbers. This design achieves comparable resource consumption and execution efficiency to those of 8-bit fixed-point accelerators. Experimental results demonstrate that the accelerator proposed in this study achieves the same accuracy as the native floating point on multiple neural networks without fine-tuning, and remains high computing performance. When deployed on the Xilinx ZU9P, the performance achieves 4.072 TFlops at 250 MHz, which outperforms the previous works, including the Xilinx official DPU.
引用
收藏
页码:4478 / 4491
页数:14
相关论文
共 50 条
  • [31] Parameterisable floating-point operations on FPGA
    Lee, B
    Burgess, N
    THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 1064 - 1068
  • [32] Floating-point matrix product on FPGA
    Bensaali, Faycal
    Amira, Abbes
    Sotudeh, Reza
    2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - +
  • [33] Low-precision Floating-point Arithmetic for High-performance FPGA-based CNN Acceleration
    Wu, Chen
    Wang, Mingyu
    Chu, Xinyuan
    Wang, Kun
    He, Lei
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2022, 15 (01)
  • [34] Floating-Point FPGA: Architecture and Modeling
    Ho, Chun Hok
    Yu, Chi Wai
    Leong, Philip
    Luk, Wayne
    Wilton, Steven J. E.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (12) : 1709 - 1718
  • [35] Design of a high-speed FPGA-based 32-bit floating-point FFT processor
    Mou, Shengmei
    Yang, Xiaodong
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 84 - +
  • [36] Parameterized Design and Evaluation of Bandwidth Compressor for Floating-Point Data Streams in FPGA-Based Custom Computing
    Ueno, Tomohiro
    Kono, Yoshiaki
    Sano, Kentaro
    Yamamoto, Satoru
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2013, 7806 : 90 - 102
  • [37] Evaluating Power and Energy Consumption of FPGA-based Custom Computing Machines for Scientific Floating-Point Computation
    Sano, Kentaro
    Nishikawa, Takeshi
    Aoki, Takayuki
    Yamamoto, Satoru
    PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, 2008, : 301 - +
  • [38] FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks
    Sano, Kentaro
    Yamamoto, Satoru
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2823 - 2837
  • [39] Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing
    Ueno, Tomohiro
    Sano, Kentaro
    Yamamoto, Satoru
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2017, 10 (03)
  • [40] An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
    Zhang, Weichuang
    Zhao, Jieru
    Shen, Guan
    Chen, Quan
    Chen, Chen
    Guo, Minyi
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 75 - 90