Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning

被引:50
|
作者
Shen, Xiangqing [1 ,2 ]
Liu, Bing [1 ,2 ,3 ]
Zhou, Yong [1 ,2 ]
Zhao, Jiaqi [1 ,2 ]
Liu, Mingming [1 ,4 ,5 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ, Mine Digitizat Engn Res Ctr, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Elect, Beijing 100190, Peoples R China
[4] Jiangsu Vocat Inst Architectural Technol, Sch Intelligent Mfg, Xuzhou 221008, Jiangsu, Peoples R China
[5] Jiangsu Normal Univ, Sch Mechatron Engn, Xuzhou 221008, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; Variational Autoencoder; Transfer learning; Remote sensing image captioning; Self-attention mechanisms; Convolutional neural network; Reinforcement learning; MODELS;
D O I
10.1016/j.knosys.2020.105920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning, i.e., generating the natural semantic descriptions of given image, is an essential task for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models suffered the overfitting problem and failed to utilize the semantic information in images. To this end, we propose a Variational Autoencoder and Reinforcement Learning based Two-stage Multi-task Learning Model (VRTMM) for the remote sensing image captioning task. In the first stage, we finetune the CNN jointly with the Variational Autoencoder. In the second stage, the Transformer generates the text description using both spatial and semantic features. Reinforcement Learning is then applied to enhance the quality of the generated sentences. Our model surpasses the previous state of the art records by a large margin on all seven scores on Remote Sensing Image Caption Dataset. The experiment result indicates our model is effective on remote sensing image captioning and achieves the new state-of-the-art result. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Diverse Image Captioning via Conditional Variational Autoencoder and Dual Contrastive Learning
    Xu, Jing
    Liu, Bing
    Zhou, Yong
    Liu, Mingming
    Yao, Rui
    Shao, Zhiwen
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)
  • [2] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
  • [3] Remote sensing image caption generation via transformer and reinforcement learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
  • [4] Remote sensing image caption generation via transformer and reinforcement learning
    Xiangqing Shen
    Bing Liu
    Yong Zhou
    Jiaqi Zhao
    [J]. Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
  • [5] A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning
    Liu, Chenyang
    Zhao, Rui
    Chen, Jianqi
    Qi, Zipeng
    Zou, Zhengxia
    Shi, Zhenwei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [6] The Dreaming Variational Autoencoder for Reinforcement Learning Environments
    Andersen, Per-Arne
    Goodwin, Morten
    Granmo, Ole-Christoffer
    [J]. ARTIFICIAL INTELLIGENCE XXXV (AI 2018), 2018, 11311 : 143 - 155
  • [7] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [8] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543
  • [9] A Systematic Survey of Remote Sensing Image Captioning
    Zhao, Beigeng
    [J]. IEEE ACCESS, 2021, 9 : 154086 - 154111
  • [10] Learning consensus-aware semantic knowledge for remote sensing image captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Cheng, Xina
    Tang, Xu
    Jiao, Licheng
    [J]. PATTERN RECOGNITION, 2024, 145