Multiscale Methods for Optical Remote-Sensing Image Captioning

被引:20
|
作者
Ma, Xiaofeng [1 ,2 ,3 ]
Zhao, Rui [1 ,2 ,3 ]
Shi, Zhenwei [1 ,2 ,3 ]
机构
[1] Beihang Univ, Sch Astronaut, Image Proc Ctr, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Key Lab Digital Media, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Astronaut, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Feature extraction; Remote sensing; Task analysis; Optical imaging; Semantics; Training; Measurement; Remote-sensing image captioning; multiscale; auxiliary task; attention;
D O I
10.1109/LGRS.2020.3009243
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Recently, the optical remote-sensing image-captioning task has gradually become a research hotspot because of its application prospects in the military and civil fields. Many different methods along with data sets have been proposed. Among them, the models following the encoder-decoder framework have better performance in many aspects like generating more accurate and flexible sentences. However, almost all these methods are of a single fixed receptive field and could not put enough attention on grabbing the multiscale information, which leads to incomplete image representation. In this letter, we deal with the multiscale problem and propose two multiscale methods named multiscale attention (MSA) method and multifeat attention (MFA) method, to obtain better representations for the captioning task in the remote-sensing field. The MSA method extracts features from different layers and uses the multihead attention mechanism to obtain the context feature, respectively. The MFA method combines the target-level features and the scene-level features by using the target-detection task as the auxiliary task to enrich the context feature. The experimental results demonstrate that both of them perform better with regard to the metrics like BLEU, METEOR, ROUGE_L, and CIDEr than the benchmark method.
引用
收藏
页码:2001 / 2005
页数:5
相关论文
共 50 条
  • [1] Multiscale Multiinteraction Network for Remote Sensing Image Captioning
    Wang, Yong
    Zhang, Wenkai
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2154 - 2165
  • [2] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Meng, Ran
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [4] OPTICAL REMOTE-SENSING AND IMAGE-PROCESSING
    KARIM, MA
    DUNCAN, BD
    [J]. OPTICAL ENGINEERING, 1995, 34 (11) : 3095 - 3096
  • [5] Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning
    Huang, Wei
    Wang, Qi
    Li, Xuelong
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) : 436 - 440
  • [6] Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion
    Zhao, An
    Yang, Wenzhong
    Chen, Danny
    Wei, Fuyuan
    [J]. ELECTRONICS, 2024, 13 (18)
  • [8] DMSHNet: Multiscale and Multisupervised Hierarchical Network for Remote-Sensing Image Change Detection
    Liu, Pengcheng
    Zheng, Panpan
    Wang, Liejun
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [9] Multiscale Salient Alignment Learning for Remote-Sensing Image-Text Retrieval
    Chen, Yaxiong
    Huang, Jinghao
    Li, Xiaoyu
    Xiong, Shengwu
    Lu, Xiaoqiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
  • [10] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543