Enhancing Remote Sensing Visual Question Answering: A Mask-Based Dual-Stream Feature Mutual Attention Network

被引:1
|
作者
Li, Yangyang [1 ]
Ma, Yunfei [1 ]
Liu, Guangyuan [2 ]
Wei, Qiang [1 ]
Chen, Yanqiao [3 ]
Shang, Ronghua [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
[2] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China
[3] 54th Res Inst China Elect Technol Grp Corp, Shijiazhuang 050081, Peoples R China
关键词
Feature extraction; Vectors; Task analysis; Question answering (information retrieval); Visualization; Remote sensing; Interference; Attention; dual-stream feature extraction; mask mechanism; visual question answering on remote sensing;
D O I
10.1109/LGRS.2024.3389042
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers in different RSIs processing fields. The current methods face challenges in both fully using the global and local information of the image to interact with the question information and addressing the issue of interclass interference. To address these challenges, this letter proposes a remote sensing visual question answering (RSVQA) mask-based dual-stream feature mutual attention network (MADNet). First, the dual-stream feature extraction module of the image is used to obtain image features, and the deep and shallow layer feature encoding module is used to obtain question features. Second, the attention mechanism is introduced and combined with the pointwise multiplication method to use the dual-stream features that were extracted in the earlier step. Finally, an answer relevance modulation module based on a binary mask vector is implemented to filter out irrelevant answers. In the experiments, the performance of the proposed strategy is evaluated using two datasets collected by aerial and Sentinel-2 sensors. In our study, we propose a model that outperforms previous approaches, achieving a 6.89% increase in overall accuracy (OA) over the baseline. This enhancement is notable for its persistence, even when the training data are reduced by half, as evidenced by our experiments on the low-resolution (LR) dataset.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [41] Pedestrian Behavior Recognition Based on Improved Dual-stream Network with Differential Feature in Surveillance Video
    Tan, Yonghong
    Zhou, Xuebin
    Chen, Aiwu
    Zhou, Songqing
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [42] Automatic Evaluation Method for Functional Movement Screening Based on a Dual-Stream Network and Feature Fusion
    Lin, Xiuchun
    Chen, Renguang
    Feng, Chen
    Chen, Zhide
    Yang, Xu
    Cui, Hui
    MATHEMATICS, 2024, 12 (08)
  • [43] OECA-Net: A co-attention network for visual question answering based on OCR scene text feature enhancement
    Yan, Feng
    Silamu, Wushouer
    Chai, Yachuang
    Li, Yanbing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7085 - 7096
  • [44] OECA-Net: A co-attention network for visual question answering based on OCR scene text feature enhancement
    Feng Yan
    Wushouer Silamu
    Yachuang Chai
    Yanbing Li
    Multimedia Tools and Applications, 2024, 83 : 7085 - 7096
  • [45] Boundary-enhanced dual-stream network for semantic segmentation of high-resolution remote sensing images
    Li, Xinghua
    Xie, Linglin
    Wang, Caifeng
    Miao, Jianhao
    Shen, Huanfeng
    Zhang, Liangpei
    GISCIENCE & REMOTE SENSING, 2024, 61 (01)
  • [46] An Enhanced Dual-Stream Network Using Multi-Source Remote Sensing Imagery for Water Body Segmentation
    Zhang, Xiaoyong
    Geng, Miaomiao
    Yang, Xuan
    Li, Cong
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [47] Decoding 3D Representation of Visual Imagery EEG using Attention-based Dual-Stream Convolutional Neural Network
    Ahn, Hyung-Ju
    Lee, Dae-Hyeok
    10TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI2022), 2022,
  • [48] Research on visual question answering based on dynamic memory network model of multiple attention mechanisms
    Miao, Yalin
    He, Shuyun
    Cheng, WenFang
    Li, Guodong
    Tong, Meng
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [49] Research on visual question answering based on dynamic memory network model of multiple attention mechanisms
    Yalin Miao
    Shuyun He
    WenFang Cheng
    Guodong Li
    Meng Tong
    Scientific Reports, 12
  • [50] DSAGAN: A generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion
    Fu, Jun
    Li, Weisheng
    Du, Jiao
    Xu, Liming
    Li, Weisheng (liws@cqupt.edu.cn), 1600, Elsevier Inc. (576): : 484 - 506