HRVQA: A Visual Question Answering benchmark for high-resolution aerial images

被引:0
|
作者
Li, Kun [1 ]
Vosselman, George [1 ]
Yang, Michael Ying [2 ]
机构
[1] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, Enschede, Netherlands
[2] Univ Bath, Dept Comp Sci, Visual Comp Grp, Bath, England
关键词
Visual question answering; High-resolution aerial images; Transformers; Benchmark dataset; LANGUAGE;
D O I
10.1016/j.isprsjprs.2024.06.002
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real -world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well -annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 x 1024 pixels and semi -automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute -related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/.
引用
收藏
页码:65 / 81
页数:17
相关论文
共 50 条
  • [41] Detection of Damaged Rooftop Areas From High-Resolution Aerial Images Based on Visual Bag-of-Words Model
    Tu, Jihui
    Sui, Haigang
    Feng, Wenqing
    Sun, Kaimin
    Hua, Li
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (12) : 1817 - 1821
  • [42] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
    Jain, Aman
    Kothyari, Mayank
    Kumar, Vishwajeet
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
  • [43] ESFNet: Efficient Network for Building Extraction From High-Resolution Aerial images
    Lin, Jingbo
    Jing, Weipeng
    Song, Houbing
    Chen, Guangsheng
    IEEE ACCESS, 2019, 7 : 54285 - 54294
  • [44] Roads extraction through texture from aerial and high-resolution satellite images
    Malpica, JA
    Pedraza, J
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING VI, 2001, 4170 : 358 - 366
  • [45] Block-based semantic classification of high-resolution multispectral aerial images
    Avramovic, Aleksej
    Risojevic, Vladimir
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 75 - 84
  • [46] Block-based semantic classification of high-resolution multispectral aerial images
    Aleksej Avramović
    Vladimir Risojević
    Signal, Image and Video Processing, 2016, 10 : 75 - 84
  • [47] Aerial ungulate surveys with a combination of infrared and high-resolution natural colour images
    Franke, U.
    Goll, B.
    Hohmann, U.
    Heurich, M.
    ANIMAL BIODIVERSITY AND CONSERVATION, 2012, 35 (02) : 285 - 293
  • [48] Semantic segmentation of water bodies in very high-resolution satellite and aerial images
    Wieland, Marc
    Martinis, Sandro
    Kiefl, Ralph
    Gstaiger, Veronika
    REMOTE SENSING OF ENVIRONMENT, 2023, 287
  • [49] Shadow removal method for high-resolution aerial remote sensing images based on
    Guo, Mingqiang
    Zhang, Haixue
    Huang, Ying
    Xie, Zhong
    Wu, Liang
    Zhang, Jiaming
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [50] Scene Context-Driven Vehicle Detection in High-Resolution Aerial Images
    Tao, Chao
    Mi, Li
    Li, Yansheng
    Qi, Ji
    Xiao, Yuan
    Zhang, Jiaxing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (10): : 7339 - 7351