Cross-Modal Visual Question Answering for Remote Sensing Data

被引:0
|
作者
Felix, Rafael [1 ]
Repasky, Boris [1 ,2 ]
Hodge, Samuel [1 ]
Zolfaghari, Reza [3 ]
Abbasnejad, Ehsan [2 ]
Sherrah, Jamie [2 ]
机构
[1] Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Lockheed Martin Australia STELaRLab, Mawson Lakes, Australia
[3] Def Sci & Technol Grp, Canberra, ACT, Australia
关键词
Visual Question Answering; Deep learning; Natural Language Processing; Convolution Neural Networks; Recurrent Neural Networks; OpenStreetMap; CLASSIFICATION;
D O I
10.1109/DICTA52665.2021.9647287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a crossmodal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.
引用
收藏
页码:57 / 65
页数:9
相关论文
共 50 条
  • [41] Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering
    Liu, Gang
    He, Jinlong
    Li, Pengfei
    Zhao, Zixu
    Zhong, Shenjun
    Journal of Biomedical Informatics, 2024, 160
  • [42] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [43] HGR MAXIMAL CORRELATION AUGMENTED CROSS-MODAL REMOTE SENSING RETRIEVAL
    Wang, Zhuoyue
    Wang, Xueqian
    Li, Gang
    Li, Chengxi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5053 - 5056
  • [44] RSMoDM: Multimodal Momentum Distillation Model for Remote Sensing Visual Question Answering
    Li, Pengfei
    Liu, Gang
    He, Jinlong
    Meng, Xiangxu
    Zhong, Shenjun
    Chen, Xun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16799 - 16814
  • [45] OPEN-ENDED VISUAL QUESTION ANSWERING MODEL FOR REMOTE SENSING IMAGES
    Alsaleh, Sara O.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Al Zuair, Mansour
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2848 - 2851
  • [46] Language Query-Based Transformer With Multiscale Cross-Modal Alignment for Visual Grounding on Remote Sensing Images
    Lan, Meng
    Rong, Fu
    Jiao, Hongzan
    Gao, Zhi
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
  • [47] Visual Question Generation Under Multi-granularity Cross-Modal Interaction
    Chai, Zi
    Wan, Xiaojun
    Han, Soyeon Caren
    Poon, Josiah
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 255 - 266
  • [48] From Easy to Hard: Learning Language-Guided Curriculum for Visual Question Answering on Remote Sensing Data
    Yuan, Zhenghang
    Mou, Lichao
    Wang, Qi
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [49] Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval
    Chen, Yaxiong
    Du, Chuang
    Zi, Yunfei
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [50] An Empirical Study on the Language Modal in Visual Question Answering
    Peng, Daowan
    Wei, Wei
    Mao, Xian-Ling
    Fu, Yuanyuan
    Chen, Dangyang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4109 - 4117