Regulating Balance Degree for More Reasonable Visual Question Answering Benchmark

被引:1
|
作者
Lin, Ken [1 ]
Mao, Aihua [1 ]
Liu, Jiangfeng [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
关键词
VQA; VQA-CP; long-tailed recognition;
D O I
10.1109/IJCNN55064.2022.9892252
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Superficial linguistic correlations is a critical issue for Visual Question Answering (VQA), where models can achieve high performance by exploiting the connection between question and answer, but fail to obtain better generalization ability for out-of-domain data. To ease such issue, VQA-CP v2.0 greedily re-partitions the distribution of VQA v2.0's training and test divides, it suppresses the performance improvement acquired by superficial linguistic correlations. However, some opportunistic methods (such as inverse supervision) can take advantage of the dataset's distribution characteristics to obtain high performance, which is incompatible with academic efforts to increase the model's visual reasoning and modal fusion abilities. To address this problem, we propose a more reasonable dataset in which we attempt to make the training split conform to the long-tailed distribution and the test split more balanced, so that inverse supervision does not result in performance gains and superficial linguistic correlations still can not assist the model in achieving high accuracy. Besides, we propose a decoupled training schema which can obtain better representation and visual reasoning modules to compensate for the shortcomings of ensemble-based methods that selectively learn some samples. Without any further annotations, such schema achieves state-of-the-art performance. In VQA-CP v2.0, it outperforms the simple baseline model UpDn by 15.54%. And its accuracy on VQA v2.0 has almost no drop compared to UpDn. Code is available at https://github.com/asklvd/new-benchmark-for-robust-VQA.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] RESCUENET-VQA: A LARGE-SCALE VISUAL QUESTION ANSWERING BENCHMARK FOR DAMAGE ASSESSMENT
    Sarkar, Argho
    Rahnemoonfar, Maryam
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1150 - 1153
  • [22] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [23] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Auer, Soeren
    Barone, Dante A. C.
    Bartz, Cassiano
    Cortes, Eduardo G.
    Jaradeh, Mohamad Yaser
    Karras, Oliver
    Koubarakis, Manolis
    Mouromtsev, Dmitry
    Pliukhin, Dmitrii
    Radyush, Daniil
    Shilin, Ivan
    Stocker, Markus
    Tsalapati, Eleni
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [24] Building a benchmark dataset for the Kurdish news question answering
    Saeed, Ari M.
    [J]. DATA IN BRIEF, 2024, 57
  • [25] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366
  • [26] AgXQA: A benchmark for advanced Agricultural Extension question answering
    Kpodo, Josue
    Kordjamshidi, Parisa
    Nejadhashemi, A. Pouyan
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 225
  • [27] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Sören Auer
    Dante A. C. Barone
    Cassiano Bartz
    Eduardo G. Cortes
    Mohamad Yaser Jaradeh
    Oliver Karras
    Manolis Koubarakis
    Dmitry Mouromtsev
    Dmitrii Pliukhin
    Daniil Radyush
    Ivan Shilin
    Markus Stocker
    Eleni Tsalapati
    [J]. Scientific Reports, 13
  • [28] SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning
    Mirzaee, Roshanak
    Faghihi, Hossein Rajaby
    Ning, Qiang
    Kordjamshidi, Parisa
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4582 - 4598
  • [29] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [30] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575