Semantic Segmentation of Remote Sensing Image via Self-Attention-Based Multi-Scale Feature Fusion

被引:0
|
作者
Guo D. [1 ]
Fu Y. [1 ]
Zhu Y. [1 ]
Wen W. [1 ]
机构
[1] School of Computer Science, Chengdu University of Information Technology, Chengdu
关键词
feature fusion; remote sensing image; self-attention; semantic segmentation; Swin-Transformer;
D O I
10.3724/SP.J.1089.2023.19604
中图分类号
学科分类号
摘要
In order to solve the problems of incomplete and low accuracy of semantic segmentation of remote sensing images due to complex contents, large differences in object scales, and uneven distribution of remote sensing images, we propose one semantic segmentation algorithm of remote sensing images with self-attention multi-scale feature fusion. The main body of the algorithm is based on the encoder-decoder structure, where the encoder uses the Swin-Transformer model to extract complex multi-scale features, and the decoder consists of a self-attention multi-scale feature fusion module and a feature pyramid network. Firstly, the extracted multi-scale features are adjusted to the same scale. Secondly, fed into the self-attention multi-scale feature fusion module to fuse the multi-scale features to ensure that the feature information at different scales which can be fully utilized in the semantic segmentation. Thirdly, the results of the self-attention multi-scale feature fusion are further superimposed and fused from top to down using the feature pyramid. Finally, the results are predicted. The experimental results show that the proposed algorithm achieves the mean intersection over union is 52.77% under the single-scale strategy, which is 1.42 percentage point better than the suboptimal result and the mean intersection over union is 54.19% under the multi-scale strategy, which is 1.47 percentage point better than the suboptimal result. The experiment demonstrates that the proposed algorithm can effectively fuse the multi-scale features to improve the segmentation accuracy. © 2023 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1259 / 1268
页数:9
相关论文
共 28 条
  • [1] Long J, Shelhamer E, Darrell T., Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, (2015)
  • [2] Ronneberger O, Fischer P, Brox T., U-Net: convolutional networks for biomedical image segmentation, Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, (2015)
  • [3] Chen L C, Papandreou G, Kokkinos I, Et al., Semantic image segmentation with deep convolutional nets and fully connected CRFs
  • [4] Chen L C, Papandreou G, Kokkinos I, Et al., DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2018)
  • [5] Chen L C, Papandreou G, Schroff F, Et al., Rethinking atrous convolution for semantic image segmentation
  • [6] Chen L C, Zhu Y K, Papandreou G, Et al., Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision, pp. 833-851, (2018)
  • [7] Chollet F., Xception: deep learning with depthwise separable convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800-1807, (2017)
  • [8] Wang J D, Sun K, Cheng T H, Et al., Deep high-resolution representation learning for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 10, pp. 3349-3364, (2021)
  • [9] Chen Z Q, Shang Y H, Python A, Et al., DB-BlendMask: decomposed attention and balanced BlendMask for instance segmentation of high-resolution remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 60, pp. 1-15, (2022)
  • [10] Cai Y X, Yang Y C, Zheng Q Y, Et al., BiFDANet: unsupervised bidirectional domain adaptation for semantic segmentation of remote sensing images, Remote Sensing, 14, 1, (2022)