HRVQA: A Visual Question Answering benchmark for high-resolution aerial images

被引:0
|
作者
Li, Kun [1 ]
Vosselman, George [1 ]
Yang, Michael Ying [2 ]
机构
[1] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, Enschede, Netherlands
[2] Univ Bath, Dept Comp Sci, Visual Comp Grp, Bath, England
关键词
Visual question answering; High-resolution aerial images; Transformers; Benchmark dataset; LANGUAGE;
D O I
10.1016/j.isprsjprs.2024.06.002
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real -world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well -annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 x 1024 pixels and semi -automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute -related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/.
引用
收藏
页码:65 / 81
页数:17
相关论文
共 50 条
  • [1] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [2] High-resolution aerial images for improving spatial resolution of spaceborne images
    Li, Jun
    Zhou, Yueqin
    Li, Deren
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 1999, 12 (04): : 461 - 466
  • [3] ViOCRVQA: novel benchmark dataset and VisionReader for visual question answering by understanding Vietnamese text in images
    Pham, Huy Quang
    Nguyen, Thang Kien-Bao
    Nguyen, Quan Van
    Tran, Dan Quang
    Nguyen, Nghia Hieu
    Nguyen, Kiet Van
    Nguyen, Ngan Luu-Thuy
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [4] Towards Visual Question Answering on Pathology Images
    He, Xuehai
    Cai, Zhuo
    Wei, Wenlan
    Zhang, Yichen
    Mou, Luntian
    Xing, Eric
    Xie, Pengtao
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 708 - 718
  • [5] Visual Question Answering with Textual Representations for Images
    Hirota, Yusuke
    Garcia, Noa
    Otani, Mayu
    Chu, Chenhui
    Nakashima, Yuta
    Taniguchi, Ittetsu
    Onoye, Takao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3147 - 3150
  • [6] IDENTIFICATION OF UNIQUE OBJECTS IN HIGH-RESOLUTION AERIAL IMAGES
    TRIVEDI, MM
    HARLOW, CA
    PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1984, 485 : 214 - 219
  • [7] ANALYSIS OF HIGH-RESOLUTION AERIAL IMAGES FOR OBJECT DETECTION
    TRIVEDI, MM
    BOKIL, AG
    TAKLA, MB
    MAKSYMONKO, GB
    BROACH, JT
    ADVANCES IN IMAGE COMPRESSION AND AUTOMATIC TARGET RECOGNITION, 1989, 1099 : 58 - 65
  • [8] Learning to Detect Roads in High-Resolution Aerial Images
    Mnih, Volodymyr
    Hinton, Geoffrey E.
    COMPUTER VISION - ECCV 2010, PT VI, 2010, 6316 : 210 - 223
  • [9] Towards Video Text Visual Question Answering: Benchmark and Baseline
    Zhao, Minyi
    Li, Bingjia
    Wang, Jie
    Li, Wanqing
    Zhou, Wenjing
    Zhang, Lan
    Xuyang, Shijie
    Yu, Zhihang
    Yu, Xinkun
    Li, Guangze
    Dai, Aobotao
    Zhou, Shuigeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] DETECTION OF OBJECTS IN HIGH-RESOLUTION MULTISPECTRAL AERIAL IMAGES
    TRIVEDI, MM
    HARLOW, CA
    CRESS, DH
    CHEN, C
    PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1985, 548 : 258 - 262