A Cost-Efficient FPGA-Based CNN-Transformer Using Neural ODE

被引:0
|
作者
Okubo, Ikumi [1 ]
Sugiura, Keisuke [1 ]
Matsutani, Hiroki [1 ]
机构
[1] Keio Univ, Grad Sch Sci & Technol, Yokohama 2238522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Transformers; Field programmable gate arrays; Computational modeling; Accuracy; Attention mechanisms; Quantization (signal); Costs; Training; Load modeling; Mathematical models; Artificial intelligence; machine learning; tiny machine learning;
D O I
10.1109/ACCESS.2024.3480977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. To address these issues, a hybrid approach has become a recent research trend, which replaces a part of ResNet with an MHSA (Multi-Head Self-Attention). In this paper, we propose a lightweight hybrid model which uses Neural ODE (Ordinary Differential Equation) as a backbone instead of ResNet so that we can increase the number of iterations of building blocks while reusing the same parameters, mitigating the increase in parameter size per iteration. The proposed model is deployed on a modest-sized FPGA device for edge computing. The model is further quantized by QAT (Quantization Aware Training) scheme to reduce FPGA resource utilization while suppressing the accuracy loss. The quantized model achieves 79.68% top-1 accuracy for STL10 dataset that contains 96x96 pixel images. The weights of the feature extraction network are stored on-chip to minimize the memory transfer overhead, allowing faster inference. By eliminating the overhead of memory transfers, inference can be executed seamlessly, leading to accelerated inference. The proposed FPGA implementation accelerates the backbone and MHSA parts by 34.01x , and achieves an overall 9.85x speedup when taking into account the software pre- and post-processing. The FPGA acceleration leads to 7.10x better energy efficiency compared to the ARM Cortex-A53 CPU. The proposed lightweight Transformer model is demonstrated on Xilinx ZCU104 board for the image recognition of 96x96 pixel images in this paper and can be applied to different image sizes by modifying the pre-processing layer.
引用
收藏
页码:155773 / 155788
页数:16
相关论文
共 50 条
  • [1] A Cost-efficient FPGA-based Embedded System for Biosensor Platform
    Jang, Iksu
    Seo, Jaeyoung
    Moon, Changjae
    Kim, Byungsub
    2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 67 - 68
  • [2] A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network
    Zhang, Siyu
    Mao, Wendong
    Shi, Huihong
    Wang, Zhongfeng
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [3] Automatic Modulation Classification Based on CNN-Transformer Graph Neural Network
    Wang, Dong
    Lin, Meiyan
    Zhang, Xiaoxu
    Huang, Yonghui
    Zhu, Yan
    SENSORS, 2023, 23 (16)
  • [4] Optimizing FPGA-Based CNN Accelerator Using Differentiable Neural Architecture Search
    Fan, Hongxiang
    Ferianc, Martin
    Liu, Shuanglong
    Que, Zhiqiang
    Niu, Xinyu
    Luk, Wayne
    2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 465 - 468
  • [5] Hybrid CNN-transformer network for efficient CSI feedback
    Zhao, Ruohan
    Liu, Ziang
    Song, Tianyu
    Jin, Jiyu
    Jin, Guiyue
    Fan, Lei
    PHYSICAL COMMUNICATION, 2024, 66
  • [6] Efficient FPGA-Based Transformer Accelerator Using In-Block Balanced Pruning
    Wang, Saiqun
    Zhang, Hao
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 18 - 23
  • [7] Efficient Modelling of FPGA-based IP Blocks using Neural Networks
    Lorandel, Jordane
    Prevotet, Jean-Christophe
    Helard, Maryline
    2016 13TH INTERNATIONAL SYMPOSIUM ON WIRELESS COMMUNICATION SYSTEMS (ISWCS), 2016, : 571 - 575
  • [8] Optimization of FPGA-based CNN accelerators using metaheuristics
    Sadiq M. Sait
    Aiman El-Maleh
    Mohammad Altakrouri
    Ahmad Shawahna
    The Journal of Supercomputing, 2023, 79 : 4493 - 4533
  • [9] Optimization of FPGA-based CNN accelerators using metaheuristics
    Sait, Sadiq M.
    El-Maleh, Aiman
    Altakrouri, Mohammad
    Shawahna, Ahmad
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (04): : 4493 - 4533
  • [10] Energy Efficient FPGA-Based Accelerator for Dynamic Sparse Transformer
    Li, Zuohao
    Lai, Yiwan
    Zhang, Hao
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 7 - 12