Cross-Modal Visual Question Answering for Remote Sensing Data

被引:0
|
作者
Felix, Rafael [1 ]
Repasky, Boris [1 ,2 ]
Hodge, Samuel [1 ]
Zolfaghari, Reza [3 ]
Abbasnejad, Ehsan [2 ]
Sherrah, Jamie [2 ]
机构
[1] Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Lockheed Martin Australia STELaRLab, Mawson Lakes, Australia
[3] Def Sci & Technol Grp, Canberra, ACT, Australia
关键词
Visual Question Answering; Deep learning; Natural Language Processing; Convolution Neural Networks; Recurrent Neural Networks; OpenStreetMap; CLASSIFICATION;
D O I
10.1109/DICTA52665.2021.9647287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a crossmodal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.
引用
下载
收藏
页码:57 / 65
页数:9
相关论文
共 50 条
  • [1] Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering
    Liu, Gang
    He, Jinlong
    Li, Pengfei
    Zhong, Shenjun
    Li, Hongyang
    He, Genrong
    REMOTE SENSING, 2023, 15 (19)
  • [2] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [3] Visual question answering with attention transfer and a cross-modal gating mechanism
    Li, Wei
    Sun, Jianhui
    Liu, Ge
    Zhao, Linglan
    Fang, Xiangzhong
    PATTERN RECOGNITION LETTERS, 2020, 133 (133) : 334 - 340
  • [4] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
  • [5] RSVQA: Visual Question Answering for Remote Sensing Data
    Lobry, Sylvain
    Marcos, Diego
    Murray, Jesse
    Tuia, Devis
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12): : 8555 - 8566
  • [6] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
    Yu, Jing
    Zhang, Weifeng
    Lu, Yuhang
    Qin, Zengchang
    Hu, Yue
    Tan, Jianlong
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
  • [7] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    PATTERN RECOGNITION, 2020, 108
  • [8] Medical visual question answering with symmetric interaction attention and cross-modal gating
    Chen, Zhi
    Zou, Beiji
    Dai, Yulan
    Zhu, Chengzhang
    Kong, Guilan
    Zhang, Wensheng
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [9] Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering
    Reichman, Benjamin
    Heck, Larry
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2829 - 2834
  • [10] Jointly Learning Attentions with Semantic Cross-Modal Correlation for Visual Question Answering
    Cao, Liangfu
    Gao, Lianli
    Song, Jingkuan
    Xu, Xing
    Shen, Heng Tao
    DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 248 - 260