Learning to Ask Informative Sub-Questions for Visual Question Answering

被引:1
|
作者
Uehara, Kohei [1 ]
Duan, Nan [2 ]
Harada, Tatsuya [3 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Univ Tokyo, RIKEN, Tokyo, Japan
关键词
D O I
10.1109/CVPRW56347.2022.00514
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
VQA (Visual Question Answering) model tends to make incorrect inferences for questions that require reasoning over world knowledge. Recent study has shown that training VQA models with questions that provide lower-level perceptual information along with reasoning questions improves performance. Inspired by this, we propose a novel VQA model that generates questions to actively obtain auxiliary perceptual information useful for correct reasoning. Our model consists of a VQA model for answering questions, a Visual Question Generation (VQG) model for generating questions, and an Info-score model for estimating the amount of information the generated questions contain, which is useful in answering the original question. We train the VQG model to maximize the "informativeness" provided by the Info-score model to generate questions that contain as much information as possible, about the answer to the original question. Our experiments show that by in-putting the generated questions and their answers as additional information to the VQA model, it can indeed predict the answer more correctly than the baseline model.
引用
下载
收藏
页码:4680 / 4689
页数:10
相关论文
共 50 条
  • [31] Hybrid deep learning model for answering visual medical questions
    Karim Gasmi
    The Journal of Supercomputing, 2022, 78 : 15042 - 15059
  • [32] Hybrid deep learning model for answering visual medical questions
    Gasmi, Karim
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (13): : 15042 - 15059
  • [33] Learning Conditioned Graph Structures for Interpretable Visual Question Answering
    Norcliffe-Brown, Will
    Vafeias, Efstathios
    Parisot, Sarah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [34] Learning a Mixture of Conditional Gating Blocks for Visual Question Answering
    Sun, Qiang
    Fu, Yan-Wei
    Xue, Xiang-Yang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 912 - 928
  • [35] A survey of deep learning-based visual question answering
    Huang, Tong-yuan
    Yang, Yu-ling
    Yang, Xue-jiao
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2021, 28 (03) : 728 - 746
  • [36] Erasing-based Attention Learning for Visual Question Answering
    Liu, Fei
    Liu, Jing
    Hong, Richang
    Lu, Hanqing
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1175 - 1183
  • [37] ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering
    Liu, Yun
    Zhang, Xiaoming
    Zhao, Zhiyun
    Zhang, Bo
    Cheng, Lei
    Li, Zhoujun
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4520 - 4533
  • [38] Learning visual question answering on controlled semantic noisy labels
    Zhang, Haonan
    Zeng, Pengpeng
    Hu, Yuxuan
    Qian, Jin
    Song, Jingkuan
    Gao, Lianli
    PATTERN RECOGNITION, 2023, 138
  • [39] Dual-Branch Collaborative Learning for Visual Question Answering
    Tian, Weidong
    Zhao, Junxiang
    Xu, Wenzheng
    Zhao, Zhongqiu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 96 - 107
  • [40] VQACL: A Novel Visual Question Answering Continual Learning Setting
    Zhang, Xi
    Zhang, Feifei
    Xu, Changsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19102 - 19112