FastCaps: A Design Methodology for Accelerating Capsule Network on Field Programmable Gate Arrays

被引:1
|
作者
Rahoof, Abdul [1 ]
Chaturvedi, Vivek [1 ]
Shafique, Muhammad [2 ]
机构
[1] Indian Inst Technol Palakkad, Palakkad, India
[2] New York Univ, Abu Dhabi, U Arab Emirates
关键词
Capsule Network; Neural Network Pruning; FPGA; Hardware Accelerator; Deep Learning;
D O I
10.1109/IJCNN54540.2023.10191653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capsule Network (CapsNet) has shown significant improvement in understanding the variation in images along with better generalization ability compared to traditional Convolutional Neural Network (CNN). CapsNet preserves spatial relationship among extracted features and apply dynamic routing to efficiently learn the internal connections between capsules. However, due to the capsule structure and the complexity of the routing mechanism, it is non-trivial to accelerate CapsNet performance in its original form on Field Programmable Gate Array (FPGA). Most of the existing works on CapsNet have achieved limited acceleration as they implement only the dynamic routing algorithm on FPGA, while considering all the processing steps synergistically is important for real-world applications of Capsule Networks. Towards this, we propose a novel two-step approach that deploys a full-fledged CapsNet on FPGA. First, we prune the network using a novel Look-Ahead Kernel Pruning (LAKP) methodology that uses the sum of look-ahead scores of the model parameters. Next, we simplify the non-linear operations, reorder loops, and parallelize operations of the routing algorithm to reduce CapsNet hardware complexity. To the best of our knowledge, this is the first work accelerating a full-fledged CapsNet on FPGA. Experimental results on the MNIST and F-MNIST datasets (typical in Capsule Network community) show that the proposed LAKP approach achieves an effective compression rate of 99.26% and 98.84%, and achieves a throughput of 82 FPS and 48 FPS on Xilinx PYNQ-Z1 FPGA, respectively. Furthermore, reducing the hardware complexity of the routing algorithm increases the throughput to 1351 FPS and 934 FPS respectively. As corroborated by our results, this work enables highly performance-efficient deployment of CapsNets on low-cost FPGA that are popular in modern edge devices.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Reconfigurable processing with field programmable gate arrays
    Fawcett, BK
    Watson, J
    [J]. INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS 1996, PROCEEDINGS, 1996, : 293 - 302
  • [22] ARCHITECTURE OF FIELD-PROGRAMMABLE GATE ARRAYS
    ROSE, J
    ELGAMAL, A
    SANGIOVANNIVINCENTELLI, A
    [J]. PROCEEDINGS OF THE IEEE, 1993, 81 (07) : 1013 - 1029
  • [23] FIELD-PROGRAMMABLE GATE ARRAYS - INTRODUCTION
    TRIMBERGER, S
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 1992, 9 (03): : 3 - 5
  • [24] Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays
    Edmund Lee
    Guy Lemieux
    Shahriar Mirabbasi
    [J]. Journal of Signal Processing Systems, 2008, 51 : 57 - 76
  • [25] Design techniques for a stable operation of cryogenic field-programmable gate arrays
    Homulle, Harald
    Visser, Stefan
    Patra, Bishnu
    Charbon, Edoardo
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 2018, 89 (01):
  • [26] Interconnect driver design for long wires in field-programmable gate arrays
    Lee, Edmund
    Lemieux, Guy
    Mirabbasi, Shahriar
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2008, 51 (01): : 57 - 76
  • [27] Interconnect driver design for long wires in field-programmable gate arrays
    Lee, Edmund
    Lemieux, Guy
    Mirabbasi, Shahriar
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2006, : 89 - +
  • [28] An efficient sparse matrix format for accelerating regular expression matching on field-programmable gate arrays
    Jiang, Lei
    Tan, Jianlong
    Tang, Qiu
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2015, 8 (01) : 13 - 24
  • [29] A Metallic CNT Tolerant Design Methodology for Carbon Nanotube-Based Programmable Gate Arrays
    Tajary, Alireza
    Ghavami, Behnam
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2016, 25 (02)
  • [30] DeepFire: Acceleration of Convolutional Spiking Neural Network on Modern Field Programmable Gate Arrays
    Aung, Myat Thu Linn
    Qu, Chuping
    Yang, Liwei
    Luo, Tao
    Goh, Rick Siow Mong
    Wong, Weng-Fai
    [J]. 2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 28 - 32