Answer-Type Prediction for Visual Question Answering

被引:65
|
作者
Kafle, Kushal [1 ]
Kanan, Christopher [1 ]
机构
[1] Rochester Inst Technol, Chester F Carlson Ctr Imaging Sci, Rochester, NY 14623 USA
关键词
D O I
10.1109/CVPR.2016.538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.
引用
收藏
页码:4976 / 4984
页数:9
相关论文
共 50 条
  • [21] Question Answering Based on Answer Trustworthiness
    Oh, Hyo-Jung
    Lee, Chung-Hee
    Yoon, Yeo-Chan
    Jang, Myung-Gil
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 310 - 317
  • [22] Answer formulation for question-answering
    Kosseim, L
    Plamondon, L
    Guillemette, LJ
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 24 - 34
  • [23] Transformer-based Sparse Encoder and Answer Decoder for Visual Question Answering
    Peng, Longkun
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 120 - 123
  • [24] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [25] Dual-decoder transformer network for answer grounding in visual question answering
    Zhu, Liangjun
    Peng, Li
    Zhou, Weinan
    Yang, Jielong
    PATTERN RECOGNITION LETTERS, 2023, 171 : 53 - 60
  • [26] Answer-Based Entity Extraction and Alignment for Visual Text Question Answering
    Yu, Jun
    Jing, Mohan
    Liu, Weihao
    Luo, Tongxu
    Zhang, Bingyuan
    Lu, Keda
    Lei, Fangyu
    Sun, Jianqing
    Liang, Jiaen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9487 - 9491
  • [27] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
  • [28] Question recommendation and answer extraction in question answering community
    Xianfeng, Yang
    Pengfei, Liu
    International Journal of Database Theory and Application, 2016, 9 (01): : 35 - 44
  • [29] Social Question Answering: Textual, User, and Network Features for Best Answer Prediction
    Molino, Piero
    Aiello, Luca Maria
    Lops, Pasquale
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2016, 35 (01)
  • [30] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479