Refined multi-scale feature-oriented object detection of the remote sensing images

被引:0
|
作者
Zhang S. [1 ,2 ]
Li S. [3 ]
Wei G. [4 ]
Zhang X. [1 ]
Gao J. [5 ]
机构
[1] School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou
[2] Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou
[3] Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing
[4] Jinan Institute of Surveying and Mapping Survey, Jinan
[5] Institute of Spacecraft Application System Engineering, Beijing
来源
National Remote Sensing Bulletin | 2022年 / 26卷 / 12期
基金
中国国家自然科学基金;
关键词
deep learning; emote sensing; feature extraction; multi-scale feature pyramid; object detection; oriented bounding box;
D O I
10.11834/jrs.20221801
中图分类号
学科分类号
摘要
Object detection of remote sensing image is the description of visual features of the object and the expression of the image prior knowledge, and the information obtained by the interpretation has a wide range of applications in both military and civilian fields. A refined multi-scale feature-oriented object detection of remote sensing image is proposed to address the problems of insufficient feature extraction capability of remote sensing image objects in complex scenes, large variations in object scales, arbitrary and closely arranged directions, and difficulties in the accurate orientation of horizontal frames used in traditional object detection. First, a contextual attention network based on dilated convolution is designed, which can capture local and global semantic information by using convolution kernels with different dilated rates and integrate semantic information into the original features utilizing an attention mechanism to enhance feature extraction. Second, a refined feature pyramid network is proposed to reduce the loss of channel information in the feature pyramid by pixel shuffling and strengthen the network’s ability to understand multi-scale object feature information with large variances. Finally, the study uses gliding vertices to regress the oriented rectangular box to represent the location of directed objects within remote sensing images. In this work, the effectiveness of the algorithm is verified by using Fast R-CNN OBB as a baseline on the object detection public datasets DOTA and HRSC2016. Results show that the algorithm in this work improves the mean average precision (mAP) by 22.65% on the DOTA dataset compared with the baseline. The final detection accuracy mAP reaches 76.78%. The final detection accuracy mAP on the HRSC2016 dataset reached 89.95%. In addition, the algorithm in this work has a better improvement compared with the various advanced algorithms. Conclusion: First, the contextual attention network with dilated convolution is used to strengthen the object features, which enhances the discriminative ability of the convolutional neural network for objects and backgrounds in remote sensing images. Second, the refined feature pyramid is used to solve the problem of large variation of objects in remote sensing images. Finally, the direction factor of gliding vertices is introduced to represent the oriented objects, which reduces the regression boundedness problem that can be brought by angle regression. © The Author(s), 2023.
引用
收藏
页码:2616 / 2628
页数:12
相关论文
共 41 条
  • [1] Chen L C, Papandreou G, Schroff F, Adam H., Rethinking atrous convolution for semantic image segmentation, (2017)
  • [2] Chen Z M, Chen K A, Lin W Y, See J, Yu H, Ke Y, Yang C., PIoU loss: towards accurate oriented object detection in complex environments, 16th European Conference on Computer Vision, pp. 195-211, (2020)
  • [3] Ding J, Xue N, Long Y, Xia G S, Lu Q K., Learning RoI transformer for oriented object detection in aerial images, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2844-2853, (2019)
  • [4] Feng P M, Lin Y T, Guan J, He G J, Shi H F, Chambers J., TOSO: student's-T distribution aided one-stage orientation target detection in remote sensing images, 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4057-4061, (2020)
  • [5] Girshick R., Fast R-CNN, 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [6] Girshick R, Donahue J, Darrell T, Malik J., Rich feature hierarchies for accurate object detection and semantic segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [7] Guo Z H, Liu C, Zhang X S, Jiao J B, Ji X Y, Ye Q X., Beyond bounding-box: convex-hull feature adaptation for oriented and densely packed object detection, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8788-8797, (2021)
  • [8] He K M, Zhang X Y, Ren S Q, Sun J., Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, (2016)
  • [9] Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P, Luo Z B., R2CNN: rotational region CNN for orientation robust scene text detection, 2018 IEEE International Conference on Pattern Recognition (ICPR), pp. 3610-3615, (2018)
  • [10] Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S., Feature pyramid networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, (2017)