Multi-Question Learning for Visual Question Answering

被引:0
|
作者
Lei, Chenyi [1 ,2 ]
Wu, Lei [3 ]
Liu, Dong [1 ]
Li, Zhao [2 ]
Wang, Guoxin [2 ]
Tang, Haihong [2 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba Grp, Hangzhou 3, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) raises a great challenge for computer vision and natural language processing communities. Most of the existing approaches consider video-question pairs individually during training. However, we observe that there are usually multiple (either sequentially generated or not) questions for the target video in a VQA task, and the questions themselves have abundant semantic relations. To explore these relations, we propose a new paradigm for VQA termed Multi-Question Learning (MQL). Inspired by the multi-task learning, MQL learns from multiple questions jointly together with their corresponding answers for a target video sequence. The learned representations of video-question pairs are then more general to be transferred for new questions. We further propose an effective VQA framework and design a training procedure for MQL, where the specifically designed attention network models the relation between input video and corresponding questions, enabling multiple video-question pairs to be co-trained. Experimental results on public datasets show the favorable performance of the proposed MQL-VQA framework compared to state-of-the-arts.
引用
收藏
页码:11328 / 11335
页数:8
相关论文
共 50 条
  • [1] A Multi-Question Sociometric Procedure
    Barr, John A.
    [J]. PERSONNEL AND GUIDANCE JOURNAL, 1955, 33 (09): : 527 - 530
  • [2] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [3] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [4] Adversarial Learning With Multi-Modal Attention for Visual Question Answering
    Liu, Yun
    Zhang, Xiaoming
    Huang, Feiran
    Cheng, Lei
    Li, Zhoujun
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (09) : 3894 - 3908
  • [5] Learning Answer Embeddings for Visual Question Answering
    Hu, Hexiang
    Chao, Wei-Lun
    Sha, Fei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5428 - 5436
  • [6] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    [J]. MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
  • [7] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [8] Visual Question Answering as a Meta Learning Task
    Teney, Damien
    van den Hengel, Anton
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
  • [9] Selective residual learning for Visual Question Answering
    Hong, Jongkwang
    Park, Sungho
    Byun, Hyeran
    [J]. NEUROCOMPUTING, 2020, 402 : 366 - 374
  • [10] Learning Visual Knowledge Memory Networks for Visual Question Answering
    Su, Zhou
    Zhu, Chen
    Dong, Yinpeng
    Cai, Dongqi
    Chen, Yurong
    Li, Jianguo
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745