Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引:0
|
作者
Wang, Qi [1 ]
Yang, Zhigang [1 ]
Ni, Weiping [2 ]
Wu, Junzheng [2 ]
Li, Qiang [1 ]
机构
[1] Northwestern Polytechnical University, School Of Artificial Intelligence, Optics And Electronics (IOPEN), Xi'an,710072, China
[2] Northwest Institute Of Nuclear Technology, Department Of Remote Sensing, Xi'an,710072, China
基金
中国国家自然科学基金;
关键词
Economic and social effects - Image enhancement - Semantic Segmentation - Semantics;
D O I
10.1109/TGRS.2024.3502805
中图分类号
学科分类号
摘要
Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet. © 1980-2012 IEEE.
引用
下载
收藏
相关论文
共 50 条
  • [1] Spatial-Temporal Semantic Perception Network for Remote Sensing Image Semantic Change Detection
    He, You
    Zhang, Hanchao
    Ning, Xiaogang
    Zhang, Ruiqian
    Chang, Dong
    Hao, Minghui
    REMOTE SENSING, 2023, 15 (16)
  • [2] SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION
    Yan, Yupeng
    Tian, Xinmei
    Yang, LiJun
    Lu, Yijuan
    Li, Houqiang
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
  • [3] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [4] Collaborative strategy network for spatial attention image captioning
    Zhou, Dongming
    Yang, Jing
    Bao, Riqiang
    APPLIED INTELLIGENCE, 2022, 52 (08) : 9017 - 9032
  • [5] Collaborative strategy network for spatial attention image captioning
    Dongming Zhou
    Jing Yang
    Riqiang Bao
    Applied Intelligence, 2022, 52 : 9017 - 9032
  • [6] Intensive Positioning Network for Remote Sensing Image Captioning
    Wang, Shengsheng
    Chen, Jiawei
    Wang, Guangyao
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 567 - 576
  • [7] Structural Representative Network for Remote Sensing Image Captioning
    Sharma, Jaya
    Divya, Peketi
    Sravani, Yenduri
    Shekar, B. H.
    Mohan, Krishna C.
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [8] Multiscale Multiinteraction Network for Remote Sensing Image Captioning
    Wang, Yong
    Zhang, Wenkai
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2154 - 2165
  • [9] DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation
    Wang, Zhechao
    Cheng, Peirui
    Duan, Shujing
    Chen, Kaiqiang
    Wang, Zhirui
    Li, Xinming
    Sun, Xian
    REMOTE SENSING, 2024, 16 (13)
  • [10] Semantic-spatial fusion network for human parsing
    Zhang, Xiaomei
    Chen, Yingying
    Zhu, Bingke
    Wang, Jinqiao
    Tang, Ming
    NEUROCOMPUTING, 2020, 402 : 375 - 383