HRVQA: A Visual Question Answering benchmark for high-resolution aerial images

被引：0

作者：

Li, Kun ^{[1
]}

Vosselman, George ^{[1
]}

Yang, Michael Ying ^{[2
]}

机构：

[1] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, Enschede, Netherlands

[2] Univ Bath, Dept Comp Sci, Visual Comp Grp, Bath, England

来源：

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING | 2024年 / 214卷

关键词：

Visual question answering; High-resolution aerial images; Transformers; Benchmark dataset; LANGUAGE;

D O I：

10.1016/j.isprsjprs.2024.06.002

中图分类号：

P9 [自然地理学];

学科分类号：

0705 ; 070501 ;

摘要：

Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real -world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well -annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 x 1024 pixels and semi -automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute -related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/.

引用

页码：65 / 81

页数：17

共 50 条

[41] Detection of Damaged Rooftop Areas From High-Resolution Aerial Images Based on Visual Bag-of-Words Model
Tu, Jihui
Sui, Haigang
Feng, Wenqing
Sun, Kaimin
Hua, Li
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (12) : 1817 - 1821
[42] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Jain, Aman
Kothyari, Mayank
Kumar, Vishwajeet
Jyothi, Preethi
Ramakrishnan, Ganesh
Chakrabarti, Soumen
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
[43] ESFNet: Efficient Network for Building Extraction From High-Resolution Aerial images
Lin, Jingbo
Jing, Weipeng
Song, Houbing
Chen, Guangsheng
IEEE ACCESS, 2019, 7 : 54285 - 54294
[44] Roads extraction through texture from aerial and high-resolution satellite images
Malpica, JA
Pedraza, J
IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING VI, 2001, 4170 : 358 - 366
[45] Block-based semantic classification of high-resolution multispectral aerial images
Avramovic, Aleksej
Risojevic, Vladimir
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 75 - 84
[46] Block-based semantic classification of high-resolution multispectral aerial images
Aleksej Avramović
Vladimir Risojević
Signal, Image and Video Processing, 2016, 10 : 75 - 84
[47] Aerial ungulate surveys with a combination of infrared and high-resolution natural colour images
Franke, U.
Goll, B.
Hohmann, U.
Heurich, M.
ANIMAL BIODIVERSITY AND CONSERVATION, 2012, 35 (02) : 285 - 293
[48] Semantic segmentation of water bodies in very high-resolution satellite and aerial images
Wieland, Marc
Martinis, Sandro
Kiefl, Ralph
Gstaiger, Veronika
REMOTE SENSING OF ENVIRONMENT, 2023, 287
[49] Shadow removal method for high-resolution aerial remote sensing images based on
Guo, Mingqiang
Zhang, Haixue
Huang, Ying
Xie, Zhong
Wu, Liang
Zhang, Jiaming
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[50] Scene Context-Driven Vehicle Detection in High-Resolution Aerial Images
Tao, Chao
Mi, Li
Li, Yansheng
Qi, Ji
Xiao, Yuan
Zhang, Jiaxing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (10): : 7339 - 7351

← 1 2 3 4 5 →