Anchor-Free Object Detection Method in Remote Sensing Image via Adaptive Multi-Scale Feature Fusion

被引:0
|
作者
Kun W. [1 ]
Wu W. [1 ]
Juhong T. [1 ,2 ]
Xi W. [1 ]
Ying F. [1 ]
机构
[1] School of Computer Science, Chengdu University of Information Technology, Chengdu
[2] School of Computer Science, Sichuan University, Chengdu
关键词
anchor-free; attention mechanism; dilated convolution; feature fusion; object detection; remote sensing image;
D O I
10.3724/SP.J.1089.2023.19673
中图分类号
学科分类号
摘要
The characteristics of many types, dense distribution and the difference in scales of objects in remote sensing images will result in small objects difficult to be detected. Therefore, a remote sensing image anchor-free object detection method based on an adaptive multi-scale feature fusion (AMFF) and attention feature enhancement (AFE) mechanism is proposed in this paper. Firstly, the image features extracted by the backbone network are input into AMFF, which adopts an adaptive multi-scale feature fusion module to enhance feature reuse, so it can enrich feature information and enhance the multi-scale feature expression ability of the network. Secondly, the output of the features from AMFF is input into the detection head with AFE. AFE combines multi-branch dilated convolution and attention mechanism to enhance both the network’s multi-scale generalization ability of the object and the effective feature information. Finally, the detection results are obtained by classification and regression. Experiments with a variety of mainstream object detection algorithms on DIOR and NWPU VHR-10 public datasets show that the average detection accuracy of the proposed algorithm is 72.4% and 87.4%, which is 9.4 percentage points and 13.5 percentage points higher than that of the baseline network and 6.3 percentage points and 1.7 percentage points higher than that of the suboptimal results. The results demonstrate that the average detection accuracy is higher than that of the mainstream object detection algorithms. Meanwhile, the average detection accuracy of the baseline network is significantly improved, which can detect small-scale objects more accurately and effectively improve the detection accuracy of multi-scale objects. © 2023 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1405 / 1416
页数:11
相关论文
共 35 条
  • [1] Nie Guangtao, Huang Hua, A survey of object detection in optical remote sensing images, Acta Automatica Sinica, 47, 8, pp. 1749-1768, (2021)
  • [2] Girshick R, Donahue J, Darrell T, Et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [3] Girshick R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [4] Ren S Q, He K M, Girshick R, Et al., Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
  • [5] He K M, Gkioxari G, Dollar P, Et al., Mask R-CNN, Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, (2017)
  • [6] Yang X, Sun H, Sun X, Et al., Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network, IEEE Access, 6, pp. 50839-50849, (2018)
  • [7] Zhang W, Wang S H, Thachan S, Et al., Deconv R-CNN for small object detection on remote sensing images, Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, pp. 2483-2486, (2018)
  • [8] Dong X H, Qin Y, Gao Y H, Et al., Attention-based multi-level feature fusion for object detection in remote sensing images, Remote Sensing, 14, 15, (2022)
  • [9] Redmon J, Divvala S, Girshick R, Et al., You only look once: unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, (2016)
  • [10] Liu W, Anguelov D, Erhan D, Et al., SSD: single shot MultiBox detector, Proceedings of the European Conference on Computer Vision, pp. 21-37, (2016)