Visual Question Generation as Dual Task of Visual Question Answering

被引:125
|
作者
Li, Yikang [1 ]
Duan, Nan [2 ]
Zhou, Bolei [3 ]
Chu, Xiao [1 ]
Ouyang, Wanli [4 ]
Wang, Xiaogang [1 ]
Zhou, Ming [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Sydney, Sydney, NSW, Australia
基金
中国博士后科学基金;
关键词
D O I
10.1109/CVPR.2018.00640
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, but they are usually explored separately despite their intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks with our proposed dual regularizers (termed as Dual Training), our model has a better understanding of the interactions among images, questions and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves the top-1 accuracy of the prior art MUTAN VQA method by 1.33% and 0.88% (absolute increase) respectiely. We also show that our proposed dual training framework can consistently improve model performances of many popular VQA architectures.
引用
收藏
页码:6116 / 6124
页数:9
相关论文
共 50 条
  • [41] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    [J]. COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
  • [42] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [43] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [44] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [45] iVQA: Inverse Visual Question Answering
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
  • [46] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [47] Visual Question Answering for Intelligent Interaction
    Gao, Panpan
    Sun, Hanxu
    Chen, Gang
    Wang, Ruiquan
    Li, Minggang
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [48] Differential Networks for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Li, Ruifan
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8997 - 9004
  • [49] VAQA: Visual Arabic Question Answering
    Kamel, Sarah M. M.
    Hassan, Shimaa I. I.
    Elrefaei, Lamiaa
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10803 - 10823
  • [50] Adapted GooLeNet for Visual Question Answering
    Huang, Jie
    Hu, Yue
    Yang, Weilong
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606