YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism

被引:0
|
作者
Li, Ang [1 ]
Song, Xiangyu [2 ]
Sun, ShiJie [1 ]
Zhang, Zhaoyang [1 ]
Cai, Taotao [3 ]
Song, Huansheng [1 ]
机构
[1] Changan Univ, Xian, Peoples R China
[2] Swinburne Univ Technol, Melbourne, Vic, Australia
[3] Macquarie Univ, Sydney, NSW, Australia
来源
关键词
Object detection; CNN architecture; Attention mechanism; Decoupled detection head;
D O I
10.1007/978-981-97-2421-5_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detector based on CNN structure has been widely used in object detection, object classification and other tasks. The traditional CNN module usually adopts complex multi-branch design, which reduces the reasoning speed and memory utilization. Moreover, in many works, attention mechanism is usually added to the object detector to extract rich features in spatial information, which are usually used as additional modules of convolution without fundamental improvement from the limitations of convolution operation. Finally, traditional object detectors often have coupled detection heads, which can compromise model performance. To solve the above problems, we propose a new object detection model, YOLO-SA, based on the current popular object detector model YOLOv5. We introduce a new reparameterized module RepVGG to replace the original DarkNet53 structure of YOLOv5 model, which greatly reduces the complexity of the model and improves the detection accuracy. We introduce a self-attention mechanism module in the feature fusion part of the model, which is independent from other convolutional layers and has higher performance than other mainstream attention mechanism modules. We replace the coupled detection head in YOLOv5 model with an anchor-based decoupled detection head, which greatly improved the convergence speed in the training process. Experiments show that the detection accuracy of the YOLO-SA model proposed by us reaches 71.2% and 75.8% on COCO2014 and VOC2012 dataset respectively, which is superior to the YOLOv5s model as the baseline and other mainstream object detection models, showing certain superiority.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [21] DPNET: DUAL-PATH NETWORK FOR EFFICIENT OBJECT DETECTION WITH LIGHTWEIGHT SELF-ATTENTION
    Shi, Huimin
    Zhou, Quan
    Ni, Yinghao
    Wu, Xiaofu
    Latecki, Longin Jan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 771 - 775
  • [22] Cross-stage feature fusion and efficient self-attention for salient object detection
    Xia, Xiaofeng
    Ma, Yingdong
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [23] Cascaded feature fusion with multi-level self-attention mechanism for object detection
    Wang, Chuanxu
    Wang, Huiru
    [J]. PATTERN RECOGNITION, 2023, 138
  • [24] RoI Fusion Strategy With Self-Attention Mechanism for Object Detection in Remote Sensing Images
    Zhang, Yuxi
    Wang, Yongcheng
    Zhang, Ning
    Li, Zheng
    Zhao, Zhikang
    Gao, Yunxiao
    Chen, Chi
    Feng, Hao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5990 - 6006
  • [25] SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection
    Bhattacharyya, Prarthana
    Huang, Chengjie
    Czarnecki, Krzysztof
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3022 - 3031
  • [26] Progressive Domain Adaptive Object Detection Based on Self-Attention in Foggy Weather
    Lin, Meng
    Zhou, Gang
    Yang, Yawei
    Shi, Jun
    [J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2023, 18 (12) : 1923 - 1931
  • [27] Epilepsy detection based on multi-head self-attention mechanism
    Ru, Yandong
    An, Gaoyang
    Wei, Zheng
    Chen, Hongming
    [J]. PLOS ONE, 2024, 19 (06):
  • [28] Grid self-attention mechanism 3D object detection method based on raw point cloud
    Lu B.
    Sun Y.
    Yang Z.
    [J]. Tongxin Xuebao/Journal on Communications, 2023, 44 (10): : 72 - 84
  • [29] TLS-MHSA: An Efficient Detection Model for Encrypted Malicious Traffic based on Multi-Head Self-Attention Mechanism
    Chen, Jinfu
    Song, Luo
    Cai, Saihua
    Xie, Haodi
    Yin, Shang
    Ahmad, Bilal
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2023, 26 (04)
  • [30] LSTM-MH-SA landslide displacement prediction model based on multi-head self-attention mechanism
    Zhang, Zhen-kung
    Zhang, Dong-mei
    Li, Jiang
    Wu, Yi-ping
    [J]. ROCK AND SOIL MECHANICS, 2022, 43 : 477 - +