HRVQA: A Visual Question Answering benchmark for high-resolution aerial images

被引:0
|
作者
Li, Kun [1 ]
Vosselman, George [1 ]
Yang, Michael Ying [2 ]
机构
[1] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, Enschede, Netherlands
[2] Univ Bath, Dept Comp Sci, Visual Comp Grp, Bath, England
关键词
Visual question answering; High-resolution aerial images; Transformers; Benchmark dataset; LANGUAGE;
D O I
10.1016/j.isprsjprs.2024.06.002
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real -world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well -annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 x 1024 pixels and semi -automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute -related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/.
引用
收藏
页码:65 / 81
页数:17
相关论文
共 50 条
  • [31] Detection of mine-like objects in high-resolution aerial images
    Zhuravlev, Andrey V.
    Bugaev, Alexander S.
    Ivashov, Sergey I.
    Razevig, Vladimir V.
    ELECTRO-OPTICAL REMOTE SENSING, DETECTION, AND PHOTONIC TECHNOLOGIES AND THEIR APPLICATIONS, 2007, 6739
  • [32] Robust approach for suburban road segmentation in high-resolution aerial images
    Guo, D.
    Weeks, A.
    Klee, H.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2007, 28 (1-2) : 307 - 318
  • [33] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [34] Disparity Estimation Networks for Aerial and High-Resolution Satellite Images: A Review
    Mari, Roger
    Ehret, Thibaud
    Facciolo, Gabriele
    IMAGE PROCESSING ON LINE, 2022, 12 : 501 - 526
  • [35] Automatic extraction of road seeds from high-resolution aerial images
    Dal-Poz, AP
    Do Vale, GM
    Zanin, RB
    ANAIS DA ACADEMIA BRASILEIRA DE CIENCIAS, 2005, 77 (03): : 509 - 520
  • [36] Robust Vehicle Detection in High-Resolution Aerial Images With Imbalanced Data
    Li X.
    Li X.
    Li Z.
    Xiong X.
    Khyam M.O.
    Sun C.
    IEEE Transactions on Artificial Intelligence, 2021, 2 (03): : 238 - 250
  • [37] Detection of Cars in High-Resolution Aerial Images of Complex Urban Environments
    ElMikaty, Mohamed
    Stathaki, Tania
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (10): : 5913 - 5924
  • [38] Visual Interpretation of High-Resolution Aerial Imagery: A Tool for Land Managers
    Tangen, Brian A.
    Esser, Rebecca L.
    Walker, Benjamin A.
    JOURNAL OF FISH AND WILDLIFE MANAGEMENT, 2024, 15 (01): : 312 - 326
  • [39] High-Resolution Feature Evaluation Benchmark
    Cordes, Kai
    Rosenhahn, Bodo
    Ostermann, Joern
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 327 - 334
  • [40] Study of usability of aerial images and high-resolution satellite images in cadastre renewal works in Turkey
    Nacar, Fazil
    Karabork, Hakan
    Cay, Tayfun
    SURVEY REVIEW, 2020, 52 (372) : 191 - 204