Visual Question Generation as Dual Task of Visual Question Answering

被引:125
|
作者
Li, Yikang [1 ]
Duan, Nan [2 ]
Zhou, Bolei [3 ]
Chu, Xiao [1 ]
Ouyang, Wanli [4 ]
Wang, Xiaogang [1 ]
Zhou, Ming [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Sydney, Sydney, NSW, Australia
基金
中国博士后科学基金;
关键词
D O I
10.1109/CVPR.2018.00640
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, but they are usually explored separately despite their intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks with our proposed dual regularizers (termed as Dual Training), our model has a better understanding of the interactions among images, questions and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves the top-1 accuracy of the prior art MUTAN VQA method by 1.33% and 0.88% (absolute increase) respectiely. We also show that our proposed dual training framework can consistently improve model performances of many popular VQA architectures.
引用
收藏
页码:6116 / 6124
页数:9
相关论文
共 50 条
  • [1] Visual Question Answering as a Meta Learning Task
    Teney, Damien
    van den Hengel, Anton
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
  • [2] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [3] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [4] Salient region detection in the task of visual question answering
    Favorskaya, Margarita
    Andreev, Vladimir
    Popov, Aleksei
    [J]. IX INTERNATIONAL MULTIDISCIPLINARY SCIENTIFIC AND RESEARCH CONFERENCE MODERN ISSUES IN SCIENCE AND TECHNOLOGY / WORKSHOP ADVANCED TECHNOLOGIES IN AEROSPACE, MECHANICAL AND AUTOMATION ENGINEERING, 2018, 450
  • [5] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] DUAL LEARNING FOR VISUAL QUESTION GENERATION
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    He, Li
    Yang, Yang
    Shen, Fumin
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [7] Modular Visual Question Answering via Code Generation
    Subramanian, Sanjay
    Narasimhan, Medhini
    Khangaonkar, Kushal
    Yang, Kevin
    Nagrani, Arsha
    Schmid, Cordelia
    Zeng, Andy
    Darrell, Trevor
    Klein, Dan
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 747 - 761
  • [8] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [9] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [10] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75