Cross-Modal Visual Question Answering for Remote Sensing Data

被引:0
|
作者
Felix, Rafael [1 ]
Repasky, Boris [1 ,2 ]
Hodge, Samuel [1 ]
Zolfaghari, Reza [3 ]
Abbasnejad, Ehsan [2 ]
Sherrah, Jamie [2 ]
机构
[1] Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Lockheed Martin Australia STELaRLab, Mawson Lakes, Australia
[3] Def Sci & Technol Grp, Canberra, ACT, Australia
关键词
Visual Question Answering; Deep learning; Natural Language Processing; Convolution Neural Networks; Recurrent Neural Networks; OpenStreetMap; CLASSIFICATION;
D O I
10.1109/DICTA52665.2021.9647287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a crossmodal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.
引用
下载
收藏
页码:57 / 65
页数:9
相关论文
共 50 条
  • [31] Deep Cross-Modal Retrieval for Remote Sensing Image and Audio
    Guo Mao
    Yuan Yuan
    Lu Xiaoqiang
    2018 10TH IAPR WORKSHOP ON PATTERN RECOGNITION IN REMOTE SENSING (PRRS), 2018,
  • [32] Cross-Modal feature description for remote sensing image matching
    Li, Liangzhi
    Liu, Ming
    Ma, Lingfei
    Han, Ling
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 112
  • [33] Cross-Modal Contrastive Learning for Remote Sensing Image Classification
    Feng, Zhixi
    Song, Liangliang
    Yang, Shuyuan
    Zhang, Xinyu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [34] Mutual Attention Inception Network for Remote Sensing Visual Question Answering
    Zheng, Xiangtao
    Wang, Binqiang
    Du, Xingqian
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [35] Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
    Gong, Haifan
    Chen, Guanqi
    Liu, Sishuo
    Yu, Yizhou
    Li, Guanbin
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 456 - 460
  • [36] Open-ended remote sensing visual question answering with transformers
    Al Rahhal, Mohamad M.
    Bazi, Yakoub
    Alsaleh, Sara O.
    Al-Razgan, Muna
    Mekhalfi, Mohamed Lamine
    Al Zuair, Mansour
    Alajlan, Naif
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (18) : 6809 - 6823
  • [37] A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering
    Zhang, Zixiao
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Chen, Puhua
    Liu, Fang
    Li, Yuxuan
    Guo, Zhicheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [38] CAPTURING GLOBAL AND LOCAL INFORMATION IN REMOTE SENSING VISUAL QUESTION ANSWERING
    Guo, Yan
    Huang, Yuancheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6340 - 6343
  • [39] Hierarchical Multimodality Graph Reasoning for Remote Sensing Visual Question Answering
    Zhang, Han
    Wang, Keming
    Zhang, Laixian
    Wang, Bingshu
    Li, Xuelong
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [40] RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
    Wang, Yuduo
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62