Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection

被引:0
|
作者
Fang, Sikai [1 ]
Lu, Xiaofeng [1 ,2 ]
Huang, Yifan [1 ]
Sun, Guangling [1 ]
Liu, Xuefeng [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, 99 Shangda Rd, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Wenzhou Inst, Wenzhou, Peoples R China
关键词
Dynamic gate; Multiscale; Object detection; Self-attention; Vision transformer;
D O I
10.1007/s11042-024-18234-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-attention-based vision transformer has powerful feature extraction capabilities and has demonstrated competitive performance in several tasks. However, the conventional self-attention mechanism that exhibits global perceptual properties while favoring large-scale objects, room for improvement still remains in terms of performance at other scales during object detection. To circumvent this issue, the dynamic gate-assisted network (DGANet), a novel yet simple framework, is proposed to enhance the multiscale generalization capability of the vision transformer structure. First, we design the dynamic multi-headed self-attention mechanism (DMH-SAM), which dynamically selects the self-attention components and uses a local-to-global self-attention pattern that enables the model to learn features of objects at different scales autonomously, while reducing the computational effort. Then, we propose a dynamic multiscale encoder (DMEncoder), which weights and encodes the feature maps with different perceptual fields to self-adapt the performance gap of the network for each scale object. Extensive ablation and comparison experiments have proven the effectiveness of the proposed method. Its detection accuracy for small, medium and large targets has reached 27.6, 47.4 and 58.5 respectively, even better than the most advanced target detection methods, while its model complexity down 23%.
引用
收藏
页码:67213 / 67229
页数:17
相关论文
共 50 条
  • [21] Light-Weight Vision Transformer with Parallel Local and Global Self-Attention
    Ebert, Nikolas
    Reichardt, Laurenz
    Stricker, Didier
    Wasenmueller, Oliver
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 452 - 459
  • [22] Enhancing Skin Lesion Classification: A Self-Attention Fusion Approach with Vision Transformer
    Heroza, Rahmat Izwan
    Gan, John Q.
    Raza, Haider
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 309 - 322
  • [23] Worker behavior recognition based on temporal and spatial self-attention of vision Transformer
    Lu Y.-X.
    Xu G.-H.
    Tang B.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (03): : 446 - 454
  • [24] PLG-ViT: Vision Transformer with Parallel Local and Global Self-Attention
    Ebert, Nikolas
    Stricker, Didier
    Wasenmueller, Oliver
    SENSORS, 2023, 23 (07)
  • [25] RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification
    Zhang, Bo
    Ming, Zuheng
    Liu, Yaqian
    Feng, Wei
    He, Liang
    Zhao, Kaixing
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 329 - 339
  • [26] MSAU-NET: ROAD EXTRACTION BASED ON MULTI-HEADED SELF-ATTENTION MECHANISM AND U-NET WITH HIGH RESOLUTION REMOTE SENSING IMAGES
    Yu, Hang
    Guo, Yuru
    Liu, Zhiheng
    Zhou, Suiping
    Li, Chenyang
    Zhang, Wenjie
    Qi, Wenjuan
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6898 - 6900
  • [27] Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
    Gao, Peng
    Zhang, Xin-Yue
    Yang, Xiao-Li
    Ni, Jian-Cheng
    Wang, Fei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 161 - 164
  • [28] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [29] A new bearing fault diagnosis method based on improved weighted multi-scale morphological filter and multi-headed self-attention capsule restricted boltzmann network
    Liu, Yiyang
    Li, Changxian
    Cui, Yunxian
    Song, Xudong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9915 - 9928
  • [30] A Multiscale Self-Attention Deep Clustering for Change Detection in SAR Images
    Dong, Huihui
    Ma, Wenping
    Jiao, Licheng
    Liu, Fang
    Li, LingLing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60