HRVQA: A Visual Question Answering benchmark for high-resolution aerial images

被引:0
|
作者
Li, Kun [1 ]
Vosselman, George [1 ]
Yang, Michael Ying [2 ]
机构
[1] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, Enschede, Netherlands
[2] Univ Bath, Dept Comp Sci, Visual Comp Grp, Bath, England
关键词
Visual question answering; High-resolution aerial images; Transformers; Benchmark dataset; LANGUAGE;
D O I
10.1016/j.isprsjprs.2024.06.002
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real -world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well -annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 x 1024 pixels and semi -automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute -related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/.
引用
收藏
页码:65 / 81
页数:17
相关论文
共 50 条
  • [21] Deep learning for region detection in high-resolution aerial images
    Khryashchev, Vladimir V.
    Priorov, Andrey
    Pavlov, Vladimir A.
    Ostrovskaya, Anna A.
    PROCEEDINGS OF 2018 IEEE EAST-WEST DESIGN & TEST SYMPOSIUM (EWDTS 2018), 2018,
  • [22] Automatic Georeferencing of Aerial Images Using Stereo High-Resolution Satellite Images
    Oh, Jaehong
    Toth, Charles K.
    Grejner-Brzezinska, Dorota A.
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2011, 77 (11): : 1157 - 1168
  • [23] Extraction of Water Bodies from High-Resolution Aerial and Satellite Images Using Visual Foundation Models
    Ozdemir, Samed
    Akbulut, Zeynep
    Karsli, Fevzi
    Kavzoglu, Taskin
    SUSTAINABILITY, 2024, 16 (07)
  • [24] A high-speed feature matching method of high-resolution aerial images
    Peng, Zhiyong
    Wu, Jun
    Zhang, Yongjun
    Lin, Xianhua
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (03) : 705 - 722
  • [25] A high-speed feature matching method of high-resolution aerial images
    Zhiyong Peng
    Jun Wu
    Yongjun Zhang
    Xianhua Lin
    Journal of Real-Time Image Processing, 2021, 18 : 705 - 722
  • [26] A New Visual Question Answering System for Medical images characterization
    Bghiel, Afrae
    Dahdouh, Yousra
    Allaouzi, Imane
    Ben Ahmed, Mohamed
    Anouar Boudhir, Abdelhakim
    4TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA' 19), 2019,
  • [27] CIRCUITVQA: A Visual Question Answering Dataset for Electrical Circuit Images
    Mehta, Rahul
    Singh, Bhavyajeet
    Varma, Vasudeva
    Gupta, Manish
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024, 2024, 14941 : 440 - 460
  • [28] Visual7W: Grounded Question Answering in Images
    Zhu, Yuke
    Groth, Oliver
    Bernstein, Michael
    Li Fei-Fei
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4995 - 5004
  • [29] SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
    Tanaka, Ryota
    Nishida, Kyosuke
    Nishida, Kosuke
    Hasegawa, Taku
    Saito, Itsumi
    Saito, Kuniko
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13636 - 13645
  • [30] Localization and Grading of Building Roof Damages in High-Resolution Aerial Images
    Boege, Melanie
    Bulatov, Dimitri
    Lucks, Lukas
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2019), 2020, 1182 : 497 - 519