Optimizing FPGA-Based DNN Accelerator With Shared Exponential Floating-Point Format

被引:6
|
作者
Zhao, Wenzhe [1 ,2 ]
Dang, Qiwei [1 ,2 ]
Xia, Tian [1 ,2 ]
Zhang, Jingming [1 ,2 ]
Zheng, Nanning [1 ,2 ]
Ren, Pengju [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; accelerator; low-precision floating point; field-programmable gate array (FPGA); very large scale integration circuit (VLSI); PERFORMANCE;
D O I
10.1109/TCSI.2023.3300657
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, low-precision fixed-point computation has become a widely used technique for neural network inference on FPGAs. However, this approach has some limitations, as certain neural networks are difficult to quantify using fixed-point arithmetic, such as those involved in super-resolution scaling, image denoising, and other scenarios that lack sufficient conditions for fine-tuning. Furthermore, deploying a floating-point precision neural network directly on an FPGA would lead to significant hardware overhead and low computational efficiency. To address this issue, this paper proposes an FPGA-friendly floating-point data format that achieves the same storage density as int8 without sacrificing inference accuracy or requiring fine-tuning. Additionally, this paper presents an FPGA-based neural network accelerator that is compatible with the proposed format, utilizing DSP resources to increase the number of DSP cascading from 7 to 16, and solving the back-to-back accumulation issue of floating-point numbers. This design achieves comparable resource consumption and execution efficiency to those of 8-bit fixed-point accelerators. Experimental results demonstrate that the accelerator proposed in this study achieves the same accuracy as the native floating point on multiple neural networks without fine-tuning, and remains high computing performance. When deployed on the Xilinx ZU9P, the performance achieves 4.072 TFlops at 250 MHz, which outperforms the previous works, including the Xilinx official DPU.
引用
收藏
页码:4478 / 4491
页数:14
相关论文
共 50 条
  • [1] An FPGA-based floating-point Jacobi iterative solver
    Morris, GR
    Prasanna, VK
    8TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2005, : 420 - 427
  • [2] FPGA accelerator for floating-point matrix multiplication
    Jovanovic, Z.
    Milutinovic, V.
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04): : 249 - 256
  • [3] A floating-point FPGA-based self-tuning regulator
    Salcic, Z
    Cao, JY
    Nguang, SK
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2006, 53 (02) : 693 - 704
  • [4] A Scalable FPGA-based Floating-Point Gaussian Filtering Architecture
    Cuong Pham-Quoc
    Binh Tran-Thanh
    Tran Ngoc Thinh
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP), 2017, : 111 - 116
  • [5] Higher radix floating-point representations for FPGA-based arithmetic
    Catanzaro, B
    Nelson, B
    FCCM 2005: 13TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2005, : 161 - 170
  • [6] FPGA-based floating-point datapath design for geometry processing
    Xing, SZ
    Yu, WWH
    CONFIGURABLE COMPUTING: TECHNOLOGY AND APPLICATIONS, 1998, 3526 : 212 - 217
  • [7] A Universal FPGA-based Floating-point Matrix Processor for Mobile Systems
    Wang, Wenqiang
    Guo, Kaiyuan
    Gu, Mengyuan
    Ma, Yuchun
    Wang, Yu
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 139 - 146
  • [8] Effective FPGA-based Electric Motor Modeling with Floating-Point Cores
    Bachir, Tarek Ould
    David, Jean-Pierre
    Dufour, Christian
    Belanger, Jean
    IECON 2010: 36TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2010,
  • [9] Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication
    Khayyat, Ahmad
    Manjikian, Naraig
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2014, 37 (02): : 65 - 75
  • [10] FPGA Optimizations for a Pipelined Floating-Point Exponential Unit
    Alachiotis, Nikolaos
    Stamatakis, Alexandros
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2011, 6578 : 316 - 327