F-CNN: An FPGA-based Framework for Training Convolutional Neural Networks

被引:0
|
作者
Zhao, Wenlai [1 ,2 ,3 ,4 ]
Fu, Haohuan [1 ,2 ,3 ,4 ]
Luk, Wayne [5 ]
Yu, Teng [5 ]
Wang, Shaojun [6 ]
Feng, Bo [1 ]
Ma, Yuchun [1 ]
Yang, Guangwen [1 ,2 ,3 ,4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tsinghua Univ, Minist Educ Key Lab Earth Syst Modeling, Beijing, Peoples R China
[3] Tsinghua Univ, Ctr Earth Syst Sci, Beijing, Peoples R China
[4] Tsinghua Natl Lab Informat Sci & Technol, Beijing, Peoples R China
[5] Imperial Coll London, Dept Comp, London, England
[6] Harbin Inst Technol, Dept Automat Test & Control, Harbin, Peoples R China
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金; 欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel reconfigurable framework for training Convolutional Neural Networks (CNNs). The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for the various layers in a CNN. The streaming datapath can support various parameterized modules which can be customized to produce implementations with different trade-offs in performance and resource usage. The modules follow the same input and output data layout, simplifying configuration scheduling. For different layers, instances of the modules contain different computation kernels in parallel, which can be customized with different layer configurations and data precision. The associated models on performance, resource and bandwidth can be used in deriving parameters for the datapath to guide the analysis of design trade-offs to meet application requirements or platform constraints. They enable estimation of the implementation specifications given different layer configurations, to maximize performance under the constraints on bandwidth and hardware resources. Experimental results indicate that the proposed module design targeting Maxeler technology can achieve a performance of 62.06 GFLOPS for 32-bit floating-point arithmetic, outperforming existing accelerators. Further evaluation based on training LeNet-5 shows that the proposed framework achieves about 4 times faster than CPU implementation of Caffe and about 7.5 times more energy efficient than the GPU implementation of Caffe.
引用
收藏
页码:107 / 114
页数:8
相关论文
共 50 条
  • [1] An FPGA-Based Processor for Training Convolutional Neural Networks
    Liu, Zhiqiang
    Dou, Yong
    Jiang, Jingfei
    Wang, Qiang
    Chow, Paul
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 207 - 210
  • [2] Optimisation of FPGA-Based Designs for Convolutional Neural Networks
    Bonifus, P. L.
    Thomas, Ann Mary
    Antony, Jobin K.
    [J]. SMART SENSORS MEASUREMENT AND INSTRUMENTATION, CISCON 2021, 2023, 957 : 209 - 221
  • [3] FPGA-Based Acceleration for Bayesian Convolutional Neural Networks
    Fan, Hongxiang
    Ferianc, Martin
    Que, Zhiqiang
    Liu, Shuanglong
    Niu, Xinyu
    Rodrigues, Miguel R. D.
    Luk, Wayne
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5343 - 5356
  • [4] An Efficient FPGA-Based Architecture for Convolutional Neural Networks
    Hwang, Wen-Jyi
    Jhang, Yun-Jie
    Tai, Tsung-Ming
    [J]. 2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2017, : 582 - 588
  • [5] A survey of FPGA-based accelerators for convolutional neural networks
    Sparsh Mittal
    [J]. Neural Computing and Applications, 2020, 32 : 1109 - 1139
  • [6] A survey of FPGA-based accelerators for convolutional neural networks
    Mittal, Sparsh
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (04): : 1109 - 1139
  • [7] FPGA-based Accelerator for Losslessly Quantized Convolutional Neural Networks
    Sit, Mankit
    Kazami, Ryosuke
    Amano, Hideharu
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 295 - 298
  • [8] An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks
    Zhou, Yongmei
    Jiang, Jingfei
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 829 - 832
  • [9] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
    Yu, Yunxuan
    Wu, Chen
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
  • [10] Composite FPGA-based Accelerator for Deep Convolutional Neural Networks
    HuanZhang
    YuanYang
    YangXiao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,