ReViT: Enhancing vision transformers feature diversity with attention residual connections

被引:3
|
作者
Diko, Anxhelo [1 ]
Avola, Danilo [1 ]
Cascio, Marco [1 ,2 ]
Cinque, Luigi [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Sci, Via Salaria 113, I-00198 Rome, Italy
[2] Univ Rome UnitelmaSapienza, Dept Law & Econ, Piazza Sassari 4, I-00161 Rome, Italy
关键词
Vision transformer; Feature collapse; Self-attention mechanism; Residual attention learning; Visual recognition;
D O I
10.1016/j.patcog.2024.110853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention learning method for improving ViT-based architectures, increasing their visual feature diversity and model robustness. In this way, the proposed network can capture and preserve significant low-level features, providing more details about the elements within the scene being analyzed. The effectiveness and robustness of the presented method are evaluated on five image classification benchmarks, including ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, and Oxford-IIIT Pet, achieving improved performances. Additionally, experiments on the COCO2017 dataset show that the devised approach discovers and incorporates semantic and spatial relationships for object detection and instance segmentation when implemented into spatial-aware transformer models.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] ENHANCING THE ADVERSARIAL TRANSFERABILITY OF VISION TRANSFORMERS THROUGH PERTURBATION INVARIANCE
    Zeng Boheng
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [32] CBPT: A NEW BACKBONE FOR ENHANCING INFORMATION TRANSMISSION OF VISION TRANSFORMERS
    Yu, Wenxin
    Zhang, Hongru
    Lan, Tianxiang
    Hu, Yucheng
    Yin, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 156 - 160
  • [33] Enhancing Skin Cancer Detection with Transfer Learning and Vision Transformers
    Ahmad, Istiak
    Alsulami, Bassma Saleh
    Alqurashi, Fahad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 1027 - 1034
  • [34] Efficient feature selection for pre-trained vision transformers
    Huang, Lan
    Zeng, Jia
    Yu, Mengqiang
    Ding, Weiping
    Bai, Xingyu
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254
  • [35] Enhancing the expressivity of quantum neural networks with residual connections
    Wen, Jingwei
    Huang, Zhiguo
    Cai, Dunbo
    Qian, Ling
    COMMUNICATIONS PHYSICS, 2024, 7 (01):
  • [36] ResMatch: Residual Attention Learning for Feature Matching
    Deng, Yuxin
    Zhang, Kaining
    Zhang, Shihua
    Li, Yansheng
    Ma, Jiayi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1501 - 1509
  • [37] Enhancing Computer Vision Performance: A Hybrid Deep Learning Approach with CNNs and Vision Transformers
    Sardar, Abha Singh
    Ranjan, Vivek
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT II, 2024, 2010 : 591 - 602
  • [38] Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation
    Cotogni, Marco
    Yang, Fei
    Cusano, Claudio
    Bagdanov, Andrew D.
    van de Weijer, Joost
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [39] Res-ViT: Residual Vision Transformers for Image Recognition Tasks
    Elmi, Sayda
    Morris, Bell
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 309 - 316
  • [40] How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
    Li, Yiran
    Wang, Junpeng
    Dai, Xin
    Wang, Liang
    Yeh, Chin-Chia Michael
    Zheng, Yan
    Zhang, Wei
    Ma, Kwan-Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 2888 - 2900