A high-throughput scalable BNN accelerator with fully pipelined architecture

被引：4

作者：

Han, Zhe ^{[1
]}

Jiang, Jingfei ^{[1
]}

Xu, Jinwei ^{[1
]}

Zhang, Peng ^{[1
]}

Zhao, Xiaoqiang ^{[1
]}

Wen, Dong ^{[2
]}

Dou, Yong ^{[3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Comp Sci, Changsha, Peoples R China

[3] Natl Univ Def Technol, Natl Lab Parallel & Distributed Comp, Changsha, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2021年 / 3卷 / 01期

关键词：

CNN; BNN; FPGA; Accelerator;

D O I：

10.1007/s42514-020-00059-0

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

By replacing multiplication with XNOR operation, Binarized Neural Networks (BNN) are hardware-friendly and extremely suitable for FPGA acceleration. Previous researches highlighted the potential exploitation of BNNs performance. However, most of the present researches targeted at minimizing chip areas. They achieved excellent energy and resource efficiency in small FPGA while the results in larger FPGA were unsatisfying. Thus, we proposed a scalable fully pipelined BNN architecture, which targeted on maximizing throughput and keeping energy and resource efficiency in large FPGA. By exploiting multi-levels parallelism and balancing pipeline stages, it achieved excellent performance. Moreover, we shared on-chip memory and balanced the computation resources to further utilizing the resource. Then a methodology is proposed that explores design space for the optimal configuration. This work is evaluated based on Xilinx UltraScale XCKU115. The results show that the proposed architecture achieves 2.24x-11.24x performance and 2.43x-11.79x resource efficiency improvement compared with other BNN accelerators.

引用

页码：17 / 30

页数：14

共 50 条

[21] A HIGH-THROUGHPUT NEURAL NETWORK ACCELERATOR
Chen, Tianshi
Du, Zidong
Sun, Ninghui
Wang, Jia
Wu, Chengyong
Chen, Yunji
Temam, Olivier
[J]. IEEE MICRO, 2015, 35 (03) : 24 - 32
[22] Matrix Multiplication based on Scalable Macro-Pipelined FPGA Accelerator Architecture
Jiang, Jiang
Mirian, Vincent
Tang, Kam Pui
Chow, Paul
Xing, Zuocheng
[J]. 2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS, 2009, : 48 - +
[23] A scalable high-throughput chemical synthesizer
Livesay, EA
Liu, YH
Luebke, KJ
Irick, J
Belosludtsev, Y
Rayner, S
Balog, R
Johnston, SA
[J]. GENOME RESEARCH, 2002, 12 (12) : 1950 - 1960
[24] A High-Throughput ECC Architecture
Amini, Esmaeil
Jeddi, Zahra
Bayoumi, Magdy
[J]. 2012 19TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2012, : 901 - 904
[25] High-throughput turbo decoder using pipelined parallel architecture and collision-free interleaver
Karim, S. M.
Chakrabarti, I.
[J]. IET COMMUNICATIONS, 2012, 6 (11) : 1416 - 1424
[26] A high-throughput pipelined CMA equalizer with minimum latency
Mizuno, W
Ueda, K
Okello, J
Ochi, H
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4184 - 4184
[27] High-throughput and area-efficient fully-pipelined hashing cores using BRAM in FPGA
Li, Lin
Lin, Shaoyu
Shen, Shuli
Wu, Kongcheng
Li, Xiaochao
Chen, Yihui
[J]. MICROPROCESSORS AND MICROSYSTEMS, 2019, 67 : 82 - 92
[28] Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication
Liu, Yao
Zhang, Junyi
Liu, Shuo
Wang, Qiaoling
Dai, Wangchen
Cheung, Ray Chak Chung
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (10) : 4194 - 4206
[29] A pipelined memory architecture for high throughput network processors
Sherwood, T
Varghese, G
Calder, B
[J]. 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 288 - 299
[30] MetaZip: A High-throughput and Efficient Accelerator for DEFLATE
Gao, Ruihao
Li, Xueqi
Li, Yewen
Wang, Xun
Tan, Guangming
[J]. PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 319 - 324

← 1 2 3 4 5 →