A high-throughput scalable BNN accelerator with fully pipelined architecture

被引：3

作者：

Han, Zhe ^{[1
]}

Jiang, Jingfei ^{[1
]}

Xu, Jinwei ^{[1
]}

Zhang, Peng ^{[1
]}

Zhao, Xiaoqiang ^{[1
]}

Wen, Dong ^{[2
]}

Dou, Yong ^{[3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Comp Sci, Changsha, Peoples R China

[3] Natl Univ Def Technol, Natl Lab Parallel & Distributed Comp, Changsha, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2021年 / 3卷 / 01期

关键词：

CNN; BNN; FPGA; Accelerator;

D O I：

10.1007/s42514-020-00059-0

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

By replacing multiplication with XNOR operation, Binarized Neural Networks (BNN) are hardware-friendly and extremely suitable for FPGA acceleration. Previous researches highlighted the potential exploitation of BNNs performance. However, most of the present researches targeted at minimizing chip areas. They achieved excellent energy and resource efficiency in small FPGA while the results in larger FPGA were unsatisfying. Thus, we proposed a scalable fully pipelined BNN architecture, which targeted on maximizing throughput and keeping energy and resource efficiency in large FPGA. By exploiting multi-levels parallelism and balancing pipeline stages, it achieved excellent performance. Moreover, we shared on-chip memory and balanced the computation resources to further utilizing the resource. Then a methodology is proposed that explores design space for the optimal configuration. This work is evaluated based on Xilinx UltraScale XCKU115. The results show that the proposed architecture achieves 2.24x-11.24x performance and 2.43x-11.79x resource efficiency improvement compared with other BNN accelerators.

引用

页码：17 / 30

页数：14

共 50 条

[1] A high-throughput scalable BNN accelerator with fully pipelined architecture
Zhe Han
Jingfei Jiang
Jinwei Xu
Peng Zhang
Xiaoqiang Zhao
Dong Wen
Yong Dou
[J]. CCF Transactions on High Performance Computing, 2021, 3 : 17 - 30
[2] A High-Throughput Pipelined Architecture for JPEG XR Encoding
Hattori, Koichi
Tsutsui, Hiroshi
Ochi, Hiroyuki
Nakamura, Yukihiro
[J]. 2009 IEEE/ACM/IFIP 7TH WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2009, : 9 - +
[3] A High-Throughput Pipelined Parallel Architecture for JPEG XR Encoding
Tsutsui, Hiroshi
Hattori, Koichi
Ochi, Hiroyuki
Nakamura, Yukihiro
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2012, 11 (04)
[4] High-throughput and fully-pipelined ciphertext multiplier for homomorphic encryption
Wang, Zeyu
Ikeda, Makoto
[J]. IEICE ELECTRONICS EXPRESS, 2024,
[5] High-throughput and fully-pipelined ciphertext multiplier for homomorphic encryption
Wang, Zeyu
Ikeda, Makoto
[J]. IEICE ELECTRONICS EXPRESS, 2024, 21 (06): : 1 - 6
[6] High-throughput pipelined mergesort
Fleming, Kermin
King, Myron
Ng, Man Cheuk
Khan, Asif
Vijayaraghavan, Muralidaran
[J]. MEMOCODE'08: SIXTH ACM & IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR CO-DESIGN, PROCEEDINGS, 2008, : 155 - 158
[7] A high-throughput pipelined architecture for blind adaptive equalization with minimum latency
Mizuno, M
Ueda, K
Okello, J
Ochi, H
[J]. THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 980 - 984
[8] A high-throughput pipelined architecture for blind adaptive equalizer with minimum latency
Mizuno, M
Ueda, K
Okello, J
Ochi, H
[J]. 2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS, 2002, : 29 - 32
[9] Pipelined High-throughput NTT Architecture for Lattice-Based Cryptography
Tan, Weihang
Wang, Antian
Lao, Yingjie
Zhang, Xinmiao
Parhi, Keshab K.
[J]. PROCEEDINGS OF THE 2021 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM (ASIANHOST), 2021,
[10] A Scalable FPGA-based Accelerator for High-Throughput MCMC Algorithms
Hosseini, Morteza
Islam, Rashidul
Kulkarni, Amey
Mohsenin, Tinoosh
[J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 201 - 201

← 1 2 3 4 5 →