Optimizing FPGA-Based DNN Accelerator With Shared Exponential Floating-Point Format

被引:6
|
作者
Zhao, Wenzhe [1 ,2 ]
Dang, Qiwei [1 ,2 ]
Xia, Tian [1 ,2 ]
Zhang, Jingming [1 ,2 ]
Zheng, Nanning [1 ,2 ]
Ren, Pengju [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; accelerator; low-precision floating point; field-programmable gate array (FPGA); very large scale integration circuit (VLSI); PERFORMANCE;
D O I
10.1109/TCSI.2023.3300657
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, low-precision fixed-point computation has become a widely used technique for neural network inference on FPGAs. However, this approach has some limitations, as certain neural networks are difficult to quantify using fixed-point arithmetic, such as those involved in super-resolution scaling, image denoising, and other scenarios that lack sufficient conditions for fine-tuning. Furthermore, deploying a floating-point precision neural network directly on an FPGA would lead to significant hardware overhead and low computational efficiency. To address this issue, this paper proposes an FPGA-friendly floating-point data format that achieves the same storage density as int8 without sacrificing inference accuracy or requiring fine-tuning. Additionally, this paper presents an FPGA-based neural network accelerator that is compatible with the proposed format, utilizing DSP resources to increase the number of DSP cascading from 7 to 16, and solving the back-to-back accumulation issue of floating-point numbers. This design achieves comparable resource consumption and execution efficiency to those of 8-bit fixed-point accelerators. Experimental results demonstrate that the accelerator proposed in this study achieves the same accuracy as the native floating point on multiple neural networks without fine-tuning, and remains high computing performance. When deployed on the Xilinx ZU9P, the performance achieves 4.072 TFlops at 250 MHz, which outperforms the previous works, including the Xilinx official DPU.
引用
收藏
页码:4478 / 4491
页数:14
相关论文
共 50 条
  • [41] Optimizing a FPGA-based Neural Accelerator for Small IoT Devices
    Hong, Seongmin
    Lee, Inho
    Park, Yongjun
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 176 - 177
  • [42] Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator
    Ferianc, Martin
    Que, Zhiqiang
    Fan, Hongxiang
    Luk, Wayne
    Rodrigues, Miguel
    2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT), 2021, : 19 - 28
  • [43] The NLMS algorithm in block floating-point format
    Mitra, A
    Chakraborty, M
    IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (03) : 301 - 304
  • [44] DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator
    Zhang, Xiaofan
    Ye, Hanchen
    Wang, Junsong
    Lin, Yonghua
    Xiong, Jinjun
    Hwu, Wen-mei
    Chen, Deming
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [45] Design of floating-point operation based on FPGA and it's application
    Cui, Yunjuan
    Chen, Baixiao
    Zhang, Shouhong
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 2716 - +
  • [46] Design of Floating-point Operand Memory Controller based on FPGA
    Li, Kejian
    Li, Yang
    Ke, Baozhong
    Lei, Lin
    PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 792 - 796
  • [47] FPGA-Based Real-Time Simulation of State-Space Models Using Floating-Point Cores
    Bachir, Tarek Ould
    David, Jean-Pierre
    PROCEEDINGS OF 14TH INTERNATIONAL POWER ELECTRONICS AND MOTION CONTROL CONFERENCE (EPE-PEMC 2010), 2010,
  • [48] A parameterized floating-point exponential function for FPGAs
    Detrey, J
    de Dinechin, F
    FPT 05: 2005 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2005, : 27 - 34
  • [49] Implementation of the Exponential Function in a Floating-Point Unit
    Álvaro Vázquez
    Elisardo Antelo
    Journal of VLSI signal processing systems for signal, image and video technology, 2003, 33 : 125 - 145
  • [50] Implementation of the exponential function in a floating-point unit
    Vázquez, A
    Antelo, E
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2003, 33 (1-2): : 125 - 145