Regulating Balance Degree for More Reasonable Visual Question Answering Benchmark

被引:1
|
作者
Lin, Ken [1 ]
Mao, Aihua [1 ]
Liu, Jiangfeng [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
关键词
VQA; VQA-CP; long-tailed recognition;
D O I
10.1109/IJCNN55064.2022.9892252
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Superficial linguistic correlations is a critical issue for Visual Question Answering (VQA), where models can achieve high performance by exploiting the connection between question and answer, but fail to obtain better generalization ability for out-of-domain data. To ease such issue, VQA-CP v2.0 greedily re-partitions the distribution of VQA v2.0's training and test divides, it suppresses the performance improvement acquired by superficial linguistic correlations. However, some opportunistic methods (such as inverse supervision) can take advantage of the dataset's distribution characteristics to obtain high performance, which is incompatible with academic efforts to increase the model's visual reasoning and modal fusion abilities. To address this problem, we propose a more reasonable dataset in which we attempt to make the training split conform to the long-tailed distribution and the test split more balanced, so that inverse supervision does not result in performance gains and superficial linguistic correlations still can not assist the model in achieving high accuracy. Besides, we propose a decoupled training schema which can obtain better representation and visual reasoning modules to compensate for the shortcomings of ensemble-based methods that selectively learn some samples. Without any further annotations, such schema achieves state-of-the-art performance. In VQA-CP v2.0, it outperforms the simple baseline model UpDn by 15.54%. And its accuracy on VQA v2.0 has almost no drop compared to UpDn. Code is available at https://github.com/asklvd/new-benchmark-for-robust-VQA.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Towards Video Text Visual Question Answering: Benchmark and Baseline
    Zhao, Minyi
    Li, Bingjia
    Wang, Jie
    Li, Wanqing
    Zhou, Wenjing
    Zhang, Lan
    Xuyang, Shijie
    Yu, Zhihang
    Yu, Xinkun
    Li, Guangze
    Dai, Aobotao
    Zhou, Shuigeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [3] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
  • [4] A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge
    Schwenk, Dustin
    Khandelwal, Apoorv
    Clark, Christopher
    Marino, Kenneth
    Mottaghi, Roozbeh
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 146 - 162
  • [5] TempQuestions: A Benchmark for Temporal Question Answering
    Jia, Zhen
    Abujabal, Abdalghani
    Roy, Rishiraj Saha
    Stroetgen, Jannik
    Weikum, Gerhard
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1057 - 1062
  • [6] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [7] HRVQA: A Visual Question Answering benchmark for high-resolution aerial images
    Li, Kun
    Vosselman, George
    Yang, Michael Ying
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2024, 214 : 65 - 81
  • [8] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [9] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
    Jain, Aman
    Kothyari, Mayank
    Kumar, Vishwajeet
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
  • [10] Event-Oriented Visual Question Answering: The E-VQA Dataset and Benchmark
    Yang, Zhenguo
    Xiang, Jiale
    You, Jiuxiang
    Li, Qing
    Liu, Wenyin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10210 - 10223