Learning to Ask Informative Sub-Questions for Visual Question Answering

被引:1
|
作者
Uehara, Kohei [1 ]
Duan, Nan [2 ]
Harada, Tatsuya [3 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Univ Tokyo, RIKEN, Tokyo, Japan
关键词
D O I
10.1109/CVPRW56347.2022.00514
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
VQA (Visual Question Answering) model tends to make incorrect inferences for questions that require reasoning over world knowledge. Recent study has shown that training VQA models with questions that provide lower-level perceptual information along with reasoning questions improves performance. Inspired by this, we propose a novel VQA model that generates questions to actively obtain auxiliary perceptual information useful for correct reasoning. Our model consists of a VQA model for answering questions, a Visual Question Generation (VQG) model for generating questions, and an Info-score model for estimating the amount of information the generated questions contain, which is useful in answering the original question. We train the VQG model to maximize the "informativeness" provided by the Info-score model to generate questions that contain as much information as possible, about the answer to the original question. Our experiments show that by in-putting the generated questions and their answers as additional information to the VQA model, it can indeed predict the answer more correctly than the baseline model.
引用
收藏
页码:4680 / 4689
页数:10
相关论文
共 50 条
  • [1] Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
    Malinowski, Mateusz
    Rohrbach, Marcus
    Fritz, Mario
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 125 (1-3) : 110 - 135
  • [2] Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
    Mateusz Malinowski
    Marcus Rohrbach
    Mario Fritz
    [J]. International Journal of Computer Vision, 2017, 125 : 110 - 135
  • [3] Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
    Tang, Yixuan
    Ng, Hwee Tou
    Tung, Anthony K. H.
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3244 - 3249
  • [4] Localized Questions in Medical Visual Question Answering
    Tascon-Morales, Sergio
    Marquez-Neila, Pablo
    Sznitman, Raphael
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 361 - 370
  • [5] The Influence of Sub-Questions on Interviewer Performance
    Sheatsley, Paul B.
    [J]. PUBLIC OPINION QUARTERLY, 1949, 13 (02) : 310 - 313
  • [6] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [7] Multi-Question Learning for Visual Question Answering
    Lei, Chenyi
    Wu, Lei
    Liu, Dong
    Li, Zhao
    Wang, Guoxin
    Tang, Haihong
    Li, Houqiang
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
  • [8] CQ-VQA: Visual Question Answering on Categorized Questions
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] Learning Answer Embeddings for Visual Question Answering
    Hu, Hexiang
    Chao, Wei-Lun
    Sha, Fei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5428 - 5436
  • [10] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    [J]. MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336