Adaptive feature fusion with attention mechanism for multi-scale target detection

被引:28
|
作者
Ju, Moran [1 ,2 ,3 ,4 ,5 ]
Luo, Jiangning [6 ]
Wang, Zhongbo [1 ,2 ,3 ,4 ,5 ]
Luo, Haibo [1 ,2 ,4 ,5 ]
机构
[1] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110016, Liaoning, Peoples R China
[2] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110016, Liaoning, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[4] Chinese Acad Sci, Key Lab Opt Elect Informat Proc, Shenyang 110016, Liaoning, Peoples R China
[5] Key Lab Image Understanding & Comp Vis, Shenyang 110016, Liaoning, Peoples R China
[6] McGill Univ, Montreal, PQ H3A 0G4, Canada
来源
NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 07期
关键词
Deep learning; Target detection; Adaptive feature fusion; Attention mechanism; RECOGNITION;
D O I
10.1007/s00521-020-05150-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To detect the targets of different sizes, multi-scale output is used by target detectors such as YOLO V3 and DSSD. To improve the detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, the feature fusion only between the adjacent scales is not sufficient. It hasn't made advantage of the features at other scales. What is more, as a common operation for feature fusion, concatenating can't provide a mechanism to learn the importance and correlation of the features at different scales. In this paper, we propose adaptive feature fusion with attention mechanism (AFFAM) for multi-scale target detection. AFFAM utilizes pathway layer and subpixel convolution layer to resize the feature maps, which is helpful to learn better and complex feature mapping. In addition, AFFAM utilizes global attention mechanism and spatial position attention mechanism, respectively, to learn the correlation of the channel features and the importance of the spatial features at different scales adaptively. Finally, we combine AFFAM with YOLO V3 to build an efficient multi-scale target detector. The comparative experiments are conducted on PASCAL VOC dataset, KITTI dataset and Smart UVM dataset. Compared with the state-of-the-art target detectors, YOLO V3 with AFFAM achieved 84.34% mean average precision (mAP) at 19.9 FPS on PASCAL VOC dataset, 87.2% mAP at 21 FPS on KITTI dataset and 99.22% mAP at 20.6 FPS on Smart UVM dataset which outperforms other advanced target detectors.
引用
收藏
页码:2769 / 2781
页数:13
相关论文
共 50 条
  • [1] Adaptive feature fusion with attention mechanism for multi-scale target detection
    Moran Ju
    Jiangning Luo
    Zhongbo Wang
    Haibo Luo
    [J]. Neural Computing and Applications, 2021, 33 : 2769 - 2781
  • [2] Multi-Scale Feature Fusion Attention Network for Infrared Small Target Detection
    Zhang, Yidan
    Li, Chunlei
    Liu, Yundong
    Liu, Zhoufeng
    Yang, Ruimin
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [3] SSD with multi-scale feature fusion and attention mechanism
    Qiang Liu
    Lijun Dong
    Zhigao Zeng
    Wenqiu Zhu
    Yanhui Zhu
    Chen Meng
    [J]. Scientific Reports, 13 (1)
  • [4] SSD with multi-scale feature fusion and attention mechanism
    Liu, Qiang
    Dong, Lijun
    Zeng, Zhigao
    Zhu, Wenqiu
    Zhu, Yanhui
    Meng, Chen
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01):
  • [5] Residual attention mechanism and weighted feature fusion for multi-scale object detection
    Zhang, Jie
    Qi, Qiye
    Zhang, Huanlong
    Du, Qifan
    Wang, Fengxian
    Shi, Xiaoping
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40873 - 40889
  • [6] Residual attention mechanism and weighted feature fusion for multi-scale object detection
    Jie Zhang
    Qiye Qi
    Huanlong Zhang
    Qifan Du
    Fengxian Wang
    Xiaoping Shi
    [J]. Multimedia Tools and Applications, 2023, 82 : 40873 - 40889
  • [7] Multi-scale feature fusion with attention mechanism for crowded road object detection
    Wu, Jingtao
    Dai, Guojun
    Zhou, Wenhui
    Zhu, Xudong
    Wang, Zengguan
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (02)
  • [8] Multi-scale feature fusion with attention mechanism for crowded road object detection
    Jingtao Wu
    Guojun Dai
    Wenhui Zhou
    Xudong Zhu
    Zengguan Wang
    [J]. Journal of Real-Time Image Processing, 2024, 21
  • [9] A multi-scale feature fusion target detection algorithm
    Dong, Chong
    Li, Jingmei
    Wang, Jiaxiang
    [J]. 2018 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2018, 10836
  • [10] MFANet: Multi-scale feature fusion network with attention mechanism
    Wang, Gaihua
    Gan, Xin
    Cao, Qingcheng
    Zhai, Qianyu
    [J]. VISUAL COMPUTER, 2023, 39 (07): : 2969 - 2980