A Cost-Efficient FPGA-Based CNN-Transformer Using Neural ODE

被引：0

作者：

Okubo, Ikumi ^{[1
]}

Sugiura, Keisuke ^{[1
]}

Matsutani, Hiroki ^{[1
]}

机构：

[1] Keio Univ, Grad Sch Sci & Technol, Yokohama 2238522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本学术振兴会;

关键词：

Transformers; Field programmable gate arrays; Computational modeling; Accuracy; Attention mechanisms; Quantization (signal); Costs; Training; Load modeling; Mathematical models; Artificial intelligence; machine learning; tiny machine learning;

D O I：

10.1109/ACCESS.2024.3480977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. To address these issues, a hybrid approach has become a recent research trend, which replaces a part of ResNet with an MHSA (Multi-Head Self-Attention). In this paper, we propose a lightweight hybrid model which uses Neural ODE (Ordinary Differential Equation) as a backbone instead of ResNet so that we can increase the number of iterations of building blocks while reusing the same parameters, mitigating the increase in parameter size per iteration. The proposed model is deployed on a modest-sized FPGA device for edge computing. The model is further quantized by QAT (Quantization Aware Training) scheme to reduce FPGA resource utilization while suppressing the accuracy loss. The quantized model achieves 79.68% top-1 accuracy for STL10 dataset that contains 96x96 pixel images. The weights of the feature extraction network are stored on-chip to minimize the memory transfer overhead, allowing faster inference. By eliminating the overhead of memory transfers, inference can be executed seamlessly, leading to accelerated inference. The proposed FPGA implementation accelerates the backbone and MHSA parts by 34.01x , and achieves an overall 9.85x speedup when taking into account the software pre- and post-processing. The FPGA acceleration leads to 7.10x better energy efficiency compared to the ARM Cortex-A53 CPU. The proposed lightweight Transformer model is demonstrated on Xilinx ZCU104 board for the image recognition of 96x96 pixel images in this paper and can be applied to different image sizes by modifying the pre-processing layer.

引用

页码：155773 / 155788

页数：16

共 50 条

[1] A Cost-efficient FPGA-based Embedded System for Biosensor Platform
Jang, Iksu
Seo, Jaeyoung
Moon, Changjae
Kim, Byungsub
2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 67 - 68
[2] A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network
Zhang, Siyu
Mao, Wendong
Shi, Huihong
Wang, Zhongfeng
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
[3] Automatic Modulation Classification Based on CNN-Transformer Graph Neural Network
Wang, Dong
Lin, Meiyan
Zhang, Xiaoxu
Huang, Yonghui
Zhu, Yan
SENSORS, 2023, 23 (16)
[4] Optimizing FPGA-Based CNN Accelerator Using Differentiable Neural Architecture Search
Fan, Hongxiang
Ferianc, Martin
Liu, Shuanglong
Que, Zhiqiang
Niu, Xinyu
Luk, Wayne
2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 465 - 468
[5] Hybrid CNN-transformer network for efficient CSI feedback
Zhao, Ruohan
Liu, Ziang
Song, Tianyu
Jin, Jiyu
Jin, Guiyue
Fan, Lei
PHYSICAL COMMUNICATION, 2024, 66
[6] Efficient FPGA-Based Transformer Accelerator Using In-Block Balanced Pruning
Wang, Saiqun
Zhang, Hao
2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 18 - 23
[7] Efficient Modelling of FPGA-based IP Blocks using Neural Networks
Lorandel, Jordane
Prevotet, Jean-Christophe
Helard, Maryline
2016 13TH INTERNATIONAL SYMPOSIUM ON WIRELESS COMMUNICATION SYSTEMS (ISWCS), 2016, : 571 - 575
[8] Optimization of FPGA-based CNN accelerators using metaheuristics
Sadiq M. Sait
Aiman El-Maleh
Mohammad Altakrouri
Ahmad Shawahna
The Journal of Supercomputing, 2023, 79 : 4493 - 4533
[9] Optimization of FPGA-based CNN accelerators using metaheuristics
Sait, Sadiq M.
El-Maleh, Aiman
Altakrouri, Mohammad
Shawahna, Ahmad
JOURNAL OF SUPERCOMPUTING, 2023, 79 (04): : 4493 - 4533
[10] Energy Efficient FPGA-Based Accelerator for Dynamic Sparse Transformer
Li, Zuohao
Lai, Yiwan
Zhang, Hao
2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 7 - 12

← 1 2 3 4 5 →