Adaptive Hybrid Vision Transformer for Small Datasets

被引:0
|
作者
Yin, Mingjun [1 ]
Chang, Zhiyong [2 ]
Wang, Yan [3 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Peking Univ, Beijing, Peoples R China
[3] Xiaochuan Chuhai, Beijing, Peoples R China
关键词
Vision Transformer; Small Dataset; Self-Attention;
D O I
10.1109/ICTAI59109.2023.00132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.
引用
收藏
页码:873 / 880
页数:8
相关论文
共 50 条
  • [41] A novel hybrid attention gate based on vision transformer for the detection of surface defects
    Uzen, Hueseyin
    Turkoglu, Muammer
    Ozturk, Dursun
    Hanbay, Davut
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6835 - 6851
  • [42] Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge
    Dumoulin, Joren
    Houshmand, Pouya
    Jain, Vikram
    Verhelst, Marian
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [43] A hybrid CNN-vision transformer structure for remote sensing scene classification
    Li, Nan
    Hao, Siyuan
    Zhao, Kun
    REMOTE SENSING LETTERS, 2024, 15 (01) : 88 - 98
  • [44] Detection of coronary heart disease based on heart sound and hybrid Vision Transformer
    Zhao, Wenhao
    Ma, Hongwen
    Jin, Ni
    Zheng, Yineng
    Guo, Xingming
    Applied Acoustics, 2025, 230
  • [45] GRAPH ENCODING BASED HYBRID VISION TRANSFORMER FOR AUTOMATIC ROAD NETWORK EXTRACTION
    Yuan, Wei
    Ran, Weihang
    Shi, Xiaodan
    Fan, Zipei
    Cai, Yang
    Shibasaki, Ryosuke
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 3656 - 3658
  • [46] Enhancing Drowning Surveillance with a Hybrid Vision Transformer Model: A Deep Learning Approach
    Zhang, Yingying
    Li, Yancheng
    Qu, Qiang
    Lin, Huai
    Seng, Dewen
    TRAITEMENT DU SIGNAL, 2023, 40 (06) : 2861 - 2867
  • [47] A survey of maritime vision datasets
    Su, Li
    Chen, Yusheng
    Song, Hao
    Li, Wanyi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28873 - 28893
  • [48] A survey of maritime vision datasets
    Li Su
    Yusheng Chen
    Hao Song
    Wanyi Li
    Multimedia Tools and Applications, 2023, 82 : 28873 - 28893
  • [49] Vision Transformer for Pansharpening
    Meng, Xiangchao
    Wang, Nan
    Shao, Feng
    Li, Shutao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [50] Dual Vision Transformer
    Yao, Ting
    Li, Yehao
    Pan, Yingwei
    Wang, Yu
    Zhang, Xiao-Ping
    Mei, Tao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10870 - 10882