Improving visual question answering using dropout and enhanced question encoder

被引:28
|
作者
Fang, Zhiwei [1 ,2 ]
Liu, Jing [1 ]
Li, Yong [3 ]
Qiao, Yanyuan [2 ]
Lu, Hanqing [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] JD Com, Business Growth BU, Intelligent Advertising Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Coherent dropout; Siamese dropout; Enhanced question encoder; NETWORKS;
D O I
10.1016/j.patcog.2019.01.038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, the current way to use dropout in multi-path networks may cause two problems: the co-adaptations of neurons and the explosion of output variance. In this paper, we propose coherent dropout and siamese dropout mechanism to solve the two problems, respectively. Specifically, in coherent dropout, the relevant dropout layers in multiple paths are forced to work coherently to maximize the ability of preventing neuron co-adaptations. We show that the coherent dropout is simple in implementation but very effective to overcome overfitting. As for the explosion of output variance, we develop a siamese dropout mechanism to explicitly minimize the difference between the two output vectors produced from the same input data during training phase. Such mechanism can reduce the gap between training and inference phases and make the VQA model more robust. With the help of the two techniques, we further design an enhanced question encoder called Multi-path Stacked Residual RNNs which is deeper and wider and more powerful than current shallow question encoder. Extensive experiments are conducted to verify the effectiveness of coherent dropout, siamese dropout and the enhanced question encoder. And the results show that our methods can bring clear improvements to the state-of-the-art VQA models on VQA-vl and VQA-v2 datasets. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:404 / 414
页数:11
相关论文
共 50 条
  • [31] Visual Question Answering with Question Representation Update (QRU)
    Li, Ruiyu
    Jia, Jiaya
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [32] Knowledge-Enhanced Medical Visual Question Answering: A Survey
    Wang, Haofen
    Du, Huifang
    [J]. WEB AND BIG DATA. APWEB-WAIM 2022 INTERNATIONAL WORKSHOPS, KGMA 2022, SEMIBDMA 2022, DEEPLUDA 2022, 2023, 1784 : 3 - 9
  • [33] Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering
    Gao, Ling
    Zhang, Hongda
    Sheng, Nan
    Shi, Lida
    Xu, Hao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [34] Improving passage retrieval in question answering using NLP
    Tiedemann, J
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3808 : 634 - 646
  • [35] Improving question answering using named entity recognition
    Toral, A
    Noguera, E
    Llopis, F
    Muñoz, R
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 181 - 191
  • [36] Lightweight Visual Question Answering using Scene Graphs
    Nuthalapati, Sai Vidyaranya
    Chandradevan, Ramraj
    Giunchiglia, Eleonora
    Li, Bowen
    Kayser, Maxime
    Lukasiewicz, Thomas
    Yang, Carl
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3353 - 3357
  • [37] Improving Question Retrieval in Community Question Answering with Label Ranking
    Wang, Wei
    Li, Baichuan
    King, Irwin
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 349 - 356
  • [38] Improving Question Analysis for Arabic Question Answering in the Medical Domain
    Dardour, Sondes
    Fehri, Hela
    Haddar, Kais
    [J]. COMPUTACION Y SISTEMAS, 2022, 26 (03): : 1233 - 1241
  • [39] Multiple answers to a question: a new approach for visual question answering
    Hosseinabad, Sayedshayan Hashemi
    Safayani, Mehran
    Mirzaei, Abdolreza
    [J]. VISUAL COMPUTER, 2021, 37 (01): : 119 - 131
  • [40] Question-Guided Hybrid Convolution for Visual Question Answering
    Gao, Peng
    Li, Hongsheng
    Li, Shuang
    Lu, Pan
    Li, Yikang
    Hoi, Steven C. H.
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501