Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引:0
|
作者
Wang, Qi [1 ]
Yang, Zhigang [1 ]
Ni, Weiping [2 ]
Wu, Junzheng [2 ]
Li, Qiang [1 ]
机构
[1] Northwestern Polytechnical University, School Of Artificial Intelligence, Optics And Electronics (IOPEN), Xi'an,710072, China
[2] Northwest Institute Of Nuclear Technology, Department Of Remote Sensing, Xi'an,710072, China
基金
中国国家自然科学基金;
关键词
Economic and social effects - Image enhancement - Semantic Segmentation - Semantics;
D O I
10.1109/TGRS.2024.3502805
中图分类号
学科分类号
摘要
Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet. © 1980-2012 IEEE.
引用
下载
收藏
相关论文
共 50 条
  • [21] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
    Liu, Chenyang
    Yang, Jiajun
    Qi, Zipeng
    Zou, Zhengxia
    Shi, Zhenwei
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
  • [22] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
    Zhu, Yongshuo
    Li, Lu
    Chen, Keyan
    Liu, Chenyang
    Zhou, Fugen
    Shi, Zhenwei
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [23] Dense semantic embedding network for image captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2019, 90 : 285 - 296
  • [24] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)
  • [25] Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion
    Zhao, An
    Yang, Wenzhong
    Chen, Danny
    Wei, Fuyuan
    ELECTRONICS, 2024, 13 (18)
  • [26] Remote sensing image semantic segmentation network based on ENet
    Wang, Yiqin
    JOURNAL OF ENGINEERING-JOE, 2022, 2022 (12): : 1219 - 1227
  • [27] Improved SegFormer Remote Sensing Image Semantic Segmentation Network
    Zhang, Hao
    He, Lingmin
    Pan, Chen
    Computer Engineering and Applications, 2023, 59 (24) : 248 - 258
  • [28] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
    Hua, Wenyi
    Liu, Jia
    Liu, Fang
    Zhang, Wenhua
    An, Jiaqi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502
  • [29] Context Aggregation Network for Remote Sensing Image Semantic Segmentation
    Zhang, Changxing
    Bai, Xiangyu
    Wang, Dapeng
    Zhou, KeXin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [30] Semantic Segmentation of Remote Sensing Image Based on Neural Network
    Wang Ende
    Qi Kai
    Li Xuepeng
    Peng Liangyu
    ACTA OPTICA SINICA, 2019, 39 (12)