Adaptive Hybrid Vision Transformer for Small Datasets

被引:0
|
作者
Yin, Mingjun [1 ]
Chang, Zhiyong [2 ]
Wang, Yan [3 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Peking Univ, Beijing, Peoples R China
[3] Xiaochuan Chuhai, Beijing, Peoples R China
关键词
Vision Transformer; Small Dataset; Self-Attention;
D O I
10.1109/ICTAI59109.2023.00132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.
引用
收藏
页码:873 / 880
页数:8
相关论文
共 50 条
  • [1] MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
    Du, Siyi
    Bayasi, Nourhan
    Hamarneh, Ghassan
    Garbi, Rafeef
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 448 - 458
  • [2] ADAPTIVE ELECTRONIC HYBRID TRANSFORMER
    WHITE, SA
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1972, CO20 (06) : 1184 - &
  • [3] PATrans: Pixel-Adaptive Transformer for edge segmentation of cervical nuclei on small-scale datasets
    Hu, Hexuan
    Zhang, Jianyu
    Yang, Tianjin
    Hu, Qiang
    Yu, Yufeng
    Huang, Qian
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [4] Pupil Detection Using Hybrid Vision Transformer
    Wang, Li
    Wang, Changyuan
    Zhang, Yu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [5] CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI
    Aghapanah, Hamed
    Rasti, Reza
    Kermani, Saeed
    Tabesh, Faezeh
    Banaem, Hossein Yousefi
    Aliakbar, Hamidreza Pour
    Sanei, Hamid
    Segars, William Paul
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 115
  • [6] A-ViT: Adaptive Tokens for Efficient Vision Transformer
    Yin, Hongxu
    Vahdat, Arash
    Alvarez, Jose M.
    Mallya, Arun
    Kautz, Jan
    Molchanov, Pavlo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10799 - 10808
  • [7] Vision Transformer Based Adaptive Beamforming for GNSS Bands
    Aras, Irem
    Erer, Isin
    Akdemir, Eren
    2024 32nd Telecommunications Forum, TELFOR 2024 - Proceedings of Papers, 2024,
  • [8] Automatic Cardiac Pathology Recognition in Echocardiography Images using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets
    Bell-Navas, Andrés
    Groun, Nourelhouda
    Villalba-Orero, María
    Lara-Pezzi, Enrique
    Garicano-Mena, Jesús
    Le Clainche, Soledad
    Expert Systems with Applications, 2025, 264
  • [9] GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
    Shan, Dongjing
    Chen, Guiqiang
    arXiv,
  • [10] Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function
    Goceri, Evgin
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (02): : 851 - 863