A data efficient transformer based on Swin Transformer

被引:0
|
作者
Dazhi Yao
Yunxue Shao
机构
[1] Nanjing Tech University,School of Computer Science and Technology
来源
The Visual Computer | 2024年 / 40卷
关键词
Computer vision; Transformer; Classification; Data efficient;
D O I
暂无
中图分类号
学科分类号
摘要
Almost all Vision Transformer-based models need to pre-train on the massive datasets and costly computation. Suppose researchers do not have enough data to train a Vision Transformer-based model or do not have powerful GPUs to implement computation for millions of labeled data. In that case, Vision Transformer-based models have no advantages over CNNs. Swin Transformer is brought forward to figure out these problems by applying the shifted window-based self-attention, which has linear computational complexity. Although Swin Transformer significantly reduces computing costs and works well on mid-size datasets, it still performs not well when it trains on a small-size dataset. In this paper, we propose a hierarchical and data-efficient Transformer based on Swin Transformer, which we call ESwin Transformer. We mainly redesigned the patch embedding module and patch merging module of Swin Transformer. We merely applied some unsophisticated convolutional components to these modules, which significantly improved performance when we trained our model on a small dataset. Our empirical results show that ESwin Transformer trained on CIFAR10/CIFAR100 with no extra data for 300 epochs achieves 97.17%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97.17\%$$\end{document}/83.78%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$83.78\%$$\end{document} accuracy and performs better than Swin Transformer and DeiT in the same training time.
引用
收藏
页码:2589 / 2598
页数:9
相关论文
共 50 条
  • [1] A data efficient transformer based on Swin Transformer
    Yao, Dazhi
    Shao, Yunxue
    [J]. VISUAL COMPUTER, 2024, 40 (04): : 2589 - 2598
  • [2] A Swin-transformer-based model for efficient compression of turbulent flow data
    Zhang, Meng
    Yousif, Mustafa Z.
    Yu, Linqi
    Lim, Hee-Chang
    [J]. PHYSICS OF FLUIDS, 2023, 35 (08)
  • [3] SWAT: An Efficient Swin Transformer Accelerator Based on FPGA
    Dong, Qiwei
    Xie, Xiaoru
    Wang, Zhongfeng
    [J]. 29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 515 - 520
  • [4] An efficient swin transformer-based method for underwater image enhancement
    Wang, Rong
    Zhang, Yonghui
    Zhang, Jian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18691 - 18708
  • [5] An efficient swin transformer-based method for underwater image enhancement
    Rong Wang
    Yonghui Zhang
    Jian Zhang
    [J]. Multimedia Tools and Applications, 2023, 82 : 18691 - 18708
  • [6] SparseSwin: Swin transformer with sparse transformer block
    Pinasthika, Krisna
    Laksono, Blessius Sheldo Putra
    Irsal, Riyandi Banovbi Putera
    Shabiyya, Syifa Hukma
    Yudistira, Novanto
    [J]. NEUROCOMPUTING, 2024, 580
  • [7] Video Swin Transformer
    Liu, Ze
    Ning, Jia
    Cao, Yue
    Wei, Yixuan
    Zhang, Zheng
    Lin, Stephen
    Hu, Han
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3192 - 3201
  • [8] Random Swin Transformer
    Choi, Keong-Hun
    Ha, Jong-Eun
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1611 - 1614
  • [9] Swin transformer-based supervised hashing
    Liangkang Peng
    Jiangbo Qian
    Chong Wang
    Baisong Liu
    Yihong Dong
    [J]. Applied Intelligence, 2023, 53 : 17548 - 17560
  • [10] Speech Semantic Communication Based on Swin Transformer
    Zhou, Ziliang
    Zheng, Shilian
    Chen, Jie
    Zhao, Zhijin
    Yang, Xiaoniu
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (03) : 756 - 768