Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引:0
|
作者
Wang, Qi [1 ]
Yang, Zhigang [1 ]
Ni, Weiping [2 ]
Wu, Junzheng [2 ]
Li, Qiang [1 ]
机构
[1] Northwestern Polytechnical University, School Of Artificial Intelligence, Optics And Electronics (IOPEN), Xi'an,710072, China
[2] Northwest Institute Of Nuclear Technology, Department Of Remote Sensing, Xi'an,710072, China
基金
中国国家自然科学基金;
关键词
Economic and social effects - Image enhancement - Semantic Segmentation - Semantics;
D O I
10.1109/TGRS.2024.3502805
中图分类号
学科分类号
摘要
Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet. © 1980-2012 IEEE.
引用
下载
收藏
相关论文
共 50 条
  • [41] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922
  • [42] Context Prior based Semantic-Spatial Graph Network for Human Parsingq
    Hao, Huaqing
    Liu, Weibin
    Xing, Weiwei
    NEUROCOMPUTING, 2021, 457 : 13 - 25
  • [43] Cascade Semantic Prompt Alignment Network for Image Captioning
    Li, Jingyu
    Zhang, Lei
    Zhang, Kun
    Hu, Bo
    Xie, Hongtao
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5266 - 5281
  • [44] High-Order Semantic Decoupling Network for Remote Sensing Image Semantic Segmentation
    Zheng, Chengyu
    Nie, Jie
    Wang, Zhaoxin
    Song, Ning
    Wang, Jingyu
    Wei, Zhiqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [45] Collaborative Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images
    Zhang, Qian
    Yang, Guang
    Zhang, Guixu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [46] Collaborative Learning Network for Change Detection and Semantic Segmentation of Remote Sensing Images
    Zhu, Jiahang
    Zhou, Yuan
    Xu, Nuo
    Huo, Chunlei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [47] AFNet: Adaptive Fusion Network for Remote Sensing Image Semantic Segmentation
    Liu, Rui
    Mi, Li
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (09): : 7871 - 7886
  • [48] Semantic segmentation of remote sensing image based on bilateral branch network
    Li, Zhongyu
    Wang, Huajun
    Liu, Yang
    VISUAL COMPUTER, 2024, 40 (05): : 3069 - 3090
  • [49] Remote Sensing Image Semantic Segmentation Network Based on Multimodal Fusion
    Hu, Yuxiang
    Yu, Changhong
    Gao, Ming
    Computer Engineering and Applications, 60 (15): : 234 - 242
  • [50] Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation
    Li, Aijin
    Jiao, Licheng
    Zhu, Hao
    Li, Lingling
    Liu, Fang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60