A high-throughput scalable BNN accelerator with fully pipelined architecture

被引:3
|
作者
Han, Zhe [1 ]
Jiang, Jingfei [1 ]
Xu, Jinwei [1 ]
Zhang, Peng [1 ]
Zhao, Xiaoqiang [1 ]
Wen, Dong [2 ]
Dou, Yong [3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Comp Sci, Changsha, Peoples R China
[3] Natl Univ Def Technol, Natl Lab Parallel & Distributed Comp, Changsha, Peoples R China
关键词
CNN; BNN; FPGA; Accelerator;
D O I
10.1007/s42514-020-00059-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By replacing multiplication with XNOR operation, Binarized Neural Networks (BNN) are hardware-friendly and extremely suitable for FPGA acceleration. Previous researches highlighted the potential exploitation of BNNs performance. However, most of the present researches targeted at minimizing chip areas. They achieved excellent energy and resource efficiency in small FPGA while the results in larger FPGA were unsatisfying. Thus, we proposed a scalable fully pipelined BNN architecture, which targeted on maximizing throughput and keeping energy and resource efficiency in large FPGA. By exploiting multi-levels parallelism and balancing pipeline stages, it achieved excellent performance. Moreover, we shared on-chip memory and balanced the computation resources to further utilizing the resource. Then a methodology is proposed that explores design space for the optimal configuration. This work is evaluated based on Xilinx UltraScale XCKU115. The results show that the proposed architecture achieves 2.24x-11.24x performance and 2.43x-11.79x resource efficiency improvement compared with other BNN accelerators.
引用
收藏
页码:17 / 30
页数:14
相关论文
共 50 条
  • [1] A high-throughput scalable BNN accelerator with fully pipelined architecture
    Zhe Han
    Jingfei Jiang
    Jinwei Xu
    Peng Zhang
    Xiaoqiang Zhao
    Dong Wen
    Yong Dou
    [J]. CCF Transactions on High Performance Computing, 2021, 3 : 17 - 30
  • [2] A High-Throughput Pipelined Architecture for JPEG XR Encoding
    Hattori, Koichi
    Tsutsui, Hiroshi
    Ochi, Hiroyuki
    Nakamura, Yukihiro
    [J]. 2009 IEEE/ACM/IFIP 7TH WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2009, : 9 - +
  • [3] A High-Throughput Pipelined Parallel Architecture for JPEG XR Encoding
    Tsutsui, Hiroshi
    Hattori, Koichi
    Ochi, Hiroyuki
    Nakamura, Yukihiro
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2012, 11 (04)
  • [4] High-throughput and fully-pipelined ciphertext multiplier for homomorphic encryption
    Wang, Zeyu
    Ikeda, Makoto
    [J]. IEICE ELECTRONICS EXPRESS, 2024,
  • [5] High-throughput and fully-pipelined ciphertext multiplier for homomorphic encryption
    Wang, Zeyu
    Ikeda, Makoto
    [J]. IEICE ELECTRONICS EXPRESS, 2024, 21 (06): : 1 - 6
  • [6] High-throughput pipelined mergesort
    Fleming, Kermin
    King, Myron
    Ng, Man Cheuk
    Khan, Asif
    Vijayaraghavan, Muralidaran
    [J]. MEMOCODE'08: SIXTH ACM & IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR CO-DESIGN, PROCEEDINGS, 2008, : 155 - 158
  • [7] A high-throughput pipelined architecture for blind adaptive equalization with minimum latency
    Mizuno, M
    Ueda, K
    Okello, J
    Ochi, H
    [J]. THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 980 - 984
  • [8] A high-throughput pipelined architecture for blind adaptive equalizer with minimum latency
    Mizuno, M
    Ueda, K
    Okello, J
    Ochi, H
    [J]. 2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS, 2002, : 29 - 32
  • [9] Pipelined High-throughput NTT Architecture for Lattice-Based Cryptography
    Tan, Weihang
    Wang, Antian
    Lao, Yingjie
    Zhang, Xinmiao
    Parhi, Keshab K.
    [J]. PROCEEDINGS OF THE 2021 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM (ASIANHOST), 2021,
  • [10] A Scalable FPGA-based Accelerator for High-Throughput MCMC Algorithms
    Hosseini, Morteza
    Islam, Rashidul
    Kulkarni, Amey
    Mohsenin, Tinoosh
    [J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 201 - 201