Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引:0
|
作者
Fang, Sikai [1 ]
Lu, Xiaofeng [1 ,2 ]
Huang, Yifan [1 ]
Sun, Guangling [1 ]
Liu, Xuefeng [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China
关键词
Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;
D O I
10.1007/s11042-024-18234-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.
引用
收藏
页码:67213 / 67229
页数:17
相关论文
共 50 条
  • [1] Scene Text Detection Based on Multi-Headed Self-Attention Using Shifted Windows
    Huang, Baohua
    Feng, Xiaoru
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [2] Multi-modal knowledge graphs representation learning via multi-headed self-attention
    Wang, Enqiang
    Yu, Qing
    Chen, Yelin
    Slamu, Wushouer
    Luo, Xukang
    INFORMATION FUSION, 2022, 88 : 78 - 85
  • [3] Semantic Segmentation Algorithm Based Multi-headed Self-attention for Tea Picking Points
    Song Y.
    Yang S.
    Zheng Z.
    Ning J.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (09): : 297 - 305
  • [4] Lite Vision Transformer with Enhanced Self-Attention
    Yang, Chenglin
    Wang, Yilin
    Zhang, Jianming
    Zhang, He
    Wei, Zijun
    Lin, Zhe
    Yuille, Alan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11988 - 11998
  • [5] An Intelligent Athlete Signal Processing Methodology for Balance Control Ability Assessment with Multi-Headed Self-Attention Mechanism
    Xu, Nannan
    Cui, Xinze
    Wang, Xin
    Zhang, Wei
    Zhao, Tianyu
    MATHEMATICS, 2022, 10 (15)
  • [6] Prediction of Large-Scale Regional Evapotranspiration Based on Multi-Scale Feature Extraction and Multi-Headed Self-Attention
    Zheng, Xin
    Zhang, Sha
    Zhang, Jiahua
    Yang, Shanshan
    Huang, Jiaojiao
    Meng, Xianye
    Bai, Yun
    REMOTE SENSING, 2024, 16 (07)
  • [7] Vision Transformer Based on Reconfigurable Gaussian Self-attention
    Zhao L.
    Zhou J.-K.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1976 - 1988
  • [8] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
    Pan, Xuran
    Ye, Tianzhu
    Xia, Zhuofan
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2082 - 2091
  • [9] SpotNet: Self-Attention Multi-Task Network for Object Detection
    Perreault, Hughes
    Bilodeau, Guillaume-Alexandre
    Saunier, Nicolas
    Heritier, Maguelonne
    2020 17TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2020), 2020, : 230 - 237
  • [10] Rethinking Self-Attention for Multispectral Object Detection
    Hu, Sijie
    Bonardi, Fabien
    Bouchafa, Samia
    Prendinger, Helmut
    Sidibe, Desire
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 1 - 12