Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

被引:0
|
作者
Zhu, Yongshuo [1 ]
Li, Lu [1 ]
Chen, Keyan [2 ,3 ]
Liu, Chenyang [2 ,3 ]
Zhou, Fugen [1 ]
Shi, Zhenwei [2 ,3 ]
机构
[1] Beihang University, Image Processing Center, School of Astronautics, Beijing,100191, China
[2] Beihang University, Image Processing Center, School of Astronautics, State Key Laboratory of Virtual Reality Technology and Systems, Beijing,100191, China
[3] Shanghai Artificial Intelligence Laboratory, Shanghai,200232, China
基金
中国国家自然科学基金;
关键词
Adaptive boosting - Change detection - Multi-task learning - Optical remote sensing;
D O I
10.1109/TGRS.2024.3497338
中图分类号
学科分类号
摘要
Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bitemporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general features across multitemporal and spatial scenarios, and their deficiency in providing granular, robust, and precise change descriptions, we introduce a novel change captioning (CC) method based on the foundational knowledge and semantic guidance, which we term Semantic-CC. Semantic-CC alleviates the dependency of high-generalization algorithms on extensive annotations by harnessing the latent knowledge of foundation models, and it generates more comprehensive and accurate change descriptions guided by pixel-level semantics from change detection (CD). Specifically, we propose a bitemporal SAM-based encoder for dual-image feature extraction; a multitask semantic aggregation neck for facilitating information interaction between heterogeneous tasks; a straightforward multiscale CD decoder to provide pixel-level semantic guidance; and a change caption decoder based on the large language model (LLM) to generate change description sentences. Moreover, to ensure the stability of the joint training of CD and CC, we propose a three-stage training strategy that supervises different tasks at various stages. We validate the proposed method on the LEVIR-CC and LEVIR-CD datasets. The experimental results corroborate the complementarity of CD and CC, demonstrating that Semantic-CC can generate more accurate change descriptions and achieve optimal performance across both tasks. © 2024 IEEE.
引用
下载
收藏
相关论文
共 50 条
  • [1] Learning consensus-aware semantic knowledge for remote sensing image captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Cheng, Xina
    Tang, Xu
    Jiao, Licheng
    PATTERN RECOGNITION, 2024, 145
  • [2] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy
    Wu, Jie
    Hu, Haifeng
    Wu, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)
  • [4] Semantic Representations With Attention Networks for Boosting Image Captioning
    Hafeth, Deema Abdal
    Kollias, Stefanos
    Ghafoor, Mubeen
    IEEE ACCESS, 2023, 11 : 40230 - 40239
  • [5] Multi-label semantic feature fusion for remote sensing image captioning
    Wang, Shuang
    Ye, Xiutiao
    Gu, Yu
    Wang, Jihui
    Meng, Yun
    Tian, Jingxian
    Hou, Biao
    Jiao, Licheng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
  • [6] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
    Wang, Qi
    Yang, Zhigang
    Ni, Weiping
    Wu, Junzheng
    Li, Qiang
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [7] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [8] Integrating Scene Semantic Knowledge into Image Captioning
    Wei, Haiyang
    Li, Zhixin
    Huang, Feicheng
    Zhang, Canlong
    Ma, Huifang
    Shi, Zhongzhi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (02)
  • [9] Image captioning via semantic element embedding
    Zhang, Xiaodan
    He, Shengfeng
    Song, Xinhang
    Lau, Rynson W. H.
    Jiao, Jianbin
    Ye, Qixiang
    NEUROCOMPUTING, 2020, 395 : 212 - 221
  • [10] Boosting convolutional image captioning with semantic content and visual relationship
    Bai, Cong
    Zheng, Anqi
    Huang, Yuan
    Pan, Xiang
    Chen, Nan
    DISPLAYS, 2021, 70