F-CNN: An FPGA-based Framework for Training Convolutional Neural Networks

被引：0

作者：

Zhao, Wenlai ^{[1
,2
,3
,4
]}

Fu, Haohuan ^{[1
,2
,3
,4
]}

Luk, Wayne ^{[5
]}

Yu, Teng ^{[5
]}

Wang, Shaojun ^{[6
]}

Feng, Bo ^{[1
]}

Ma, Yuchun ^{[1
]}

Yang, Guangwen ^{[1
,2
,3
,4
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

[2] Tsinghua Univ, Minist Educ Key Lab Earth Syst Modeling, Beijing, Peoples R China

[3] Tsinghua Univ, Ctr Earth Syst Sci, Beijing, Peoples R China

[4] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China

[5] Imperial Coll London, Dept Comp, London, England

[6] Harbin Inst Technol, Dept Automat Test & Control, Harbin, Peoples R China

来源：

2016 IEEE 27TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP) | 2016年

基金：

英国工程与自然科学研究理事会; 中国国家自然科学基金; 欧盟地平线“2020”;

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel reconfigurable framework for training Convolutional Neural Networks (CNNs). The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for the various layers in a CNN. The streaming datapath can support various parameterized modules which can be customized to produce implementations with different trade-offs in performance and resource usage. The modules follow the same input and output data layout, simplifying configuration scheduling. For different layers, instances of the modules contain different computation kernels in parallel, which can be customized with different layer configurations and data precision. The associated models on performance, resource and bandwidth can be used in deriving parameters for the datapath to guide the analysis of design trade-offs to meet application requirements or platform constraints. They enable estimation of the implementation specifications given different layer configurations, to maximize performance under the constraints on bandwidth and hardware resources. Experimental results indicate that the proposed module design targeting Maxeler technology can achieve a performance of 62.06 GFLOPS for 32-bit floating-point arithmetic, outperforming existing accelerators. Further evaluation based on training LeNet-5 shows that the proposed framework achieves about 4 times faster than CPU implementation of Caffe and about 7.5 times more energy efficient than the GPU implementation of Caffe.

引用

页码：107 / 114

页数：8

共 50 条

[1] An FPGA-Based Processor for Training Convolutional Neural Networks
Liu, Zhiqiang
Dou, Yong
Jiang, Jingfei
Wang, Qiang
Chow, Paul
[J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 207 - 210
[2] Optimisation of FPGA-Based Designs for Convolutional Neural Networks
Bonifus, P. L.
Thomas, Ann Mary
Antony, Jobin K.
[J]. SMART SENSORS MEASUREMENT AND INSTRUMENTATION, CISCON 2021, 2023, 957 : 209 - 221
[3] FPGA-Based Acceleration for Bayesian Convolutional Neural Networks
Fan, Hongxiang
Ferianc, Martin
Que, Zhiqiang
Liu, Shuanglong
Niu, Xinyu
Rodrigues, Miguel R. D.
Luk, Wayne
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5343 - 5356
[4] An Efficient FPGA-Based Architecture for Convolutional Neural Networks
Hwang, Wen-Jyi
Jhang, Yun-Jie
Tai, Tsung-Ming
[J]. 2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2017, : 582 - 588
[5] A survey of FPGA-based accelerators for convolutional neural networks
Sparsh Mittal
[J]. Neural Computing and Applications, 2020, 32 : 1109 - 1139
[6] A survey of FPGA-based accelerators for convolutional neural networks
Mittal, Sparsh
[J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (04): : 1109 - 1139
[7] FPGA-based Accelerator for Losslessly Quantized Convolutional Neural Networks
Sit, Mankit
Kazami, Ryosuke
Amano, Hideharu
[J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 295 - 298
[8] An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks
Zhou, Yongmei
Jiang, Jingfei
[J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 829 - 832
[9] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
Yu, Yunxuan
Wu, Chen
Zhao, Tiandong
Wang, Kun
He, Lei
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
[10] Composite FPGA-based Accelerator for Deep Convolutional Neural Networks
HuanZhang
YuanYang
YangXiao
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,

← 1 2 3 4 5 →