A multi-scale contextual attention network for remote sensing visual question answering

被引:1
|
作者
Feng, Jiangfan [1 ]
Wang, Hui [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China
关键词
Remote sensing; Visual question answering (VQA); Cross-modal; Attention; Multi-scales;
D O I
10.1016/j.jag.2023.103641
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Remote sensing visual question answering (RSVQA) is a user-friendly method used for analyzing remote sensing images (RSIs) in various tasks. However, current methods often overlook geospatial objects, which possess a multi-scale representation and require contextual information. Furthermore, limited research has been conducted on modeling and reasoning the long-distance dependencies between entities, resulting in one-sided and inaccurate answer predictions. To overcome these limitations, we propose the Scale-Aware Multi-level Feature Pyramid Network (SAMFPN), which integrates contextual and multi-scale information using a Feature Pyramid Network (FPN) and Co-Attention mechanisms. The SAMFPN module incorporates a multilevel FPN to capture both global and local contextual information. Additionally, it introduces a Visual-Question Collaboration Fusion (VQCF) module that simultaneously embeds and learns visual and textual information. Our experimental results demonstrate the superior accuracy and robustness of our proposed model compared to existing models. These outcomes indicate that SAMFPN effectively captures multi-scale contextual information, making it a reliable solution for RSVQA tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Multi-scale stacking attention pooling for remote sensing scene classification
    Bi, Qi
    Zhang, Han
    Qin, Kun
    [J]. NEUROCOMPUTING, 2021, 436 : 147 - 161
  • [42] Double-Branch Multi-Scale Contextual Network: A Model for Multi-Scale Street Tree Segmentation in High-Resolution Remote Sensing Images
    Zhang, Hongyang
    Liu, Shuo
    [J]. SENSORS, 2024, 24 (04)
  • [43] Improving visual question answering for remote sensing via alternate-guided attention and combined loss
    Feng, Jiangfan
    Tang, Etao
    Zeng, Maimai
    Gu, Zhujun
    Kou, Pinglang
    Zheng, Wei
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
  • [44] MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images
    Huang, Haiyan
    Shao, Zhenfeng
    Cheng, Qimin
    Huang, Xiao
    Wu, Xiaoping
    Li, Guoming
    Tan, Li
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2023, 16 (02) : 4848 - 4866
  • [45] Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images
    Chang, Jing
    He, Xiaohui
    Li, Panle
    Tian, Ting
    Cheng, Xijie
    Qiao, Mengjia
    Zhou, Tao
    Zhang, Beibei
    Chang, Ziqian
    Fan, Tingwei
    [J]. SENSORS, 2024, 24 (03)
  • [46] Remote Sensing Small Object Detection Network Based on Attention Mechanism and Multi-Scale Feature Fusion
    Qu, Junsuo
    Tang, Zongbing
    Zhang, Le
    Zhang, Yanghai
    Zhang, Zhenguo
    [J]. REMOTE SENSING, 2023, 15 (11)
  • [47] Local relation network with multilevel attention for visual question answering
    Sun, Bo
    Yao, Zeng
    Zhang, Yinghui
    Yu, Lejun
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 73 (73)
  • [48] Word-to-region attention network for visual question answering
    Liang Peng
    Yang Yang
    Yi Bin
    Ning Xie
    Fumin Shen
    Yanli Ji
    Xing Xu
    [J]. Multimedia Tools and Applications, 2019, 78 : 3843 - 3858
  • [49] Deep Attention Neural Tensor Network for Visual Question Answering
    Bai, Yalong
    Fu, Jianlong
    Zhao, Tiejun
    Mei, Tao
    [J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
  • [50] Deep Modular Bilinear Attention Network for Visual Question Answering
    Yan, Feng
    Silamu, Wushouer
    Li, Yanbing
    [J]. SENSORS, 2022, 22 (03)