Improving visual question answering using dropout and enhanced question encoder

被引：28

作者：

Fang, Zhiwei ^{[1
,2
]}

Liu, Jing ^{[1
]}

Li, Yong ^{[3
]}

Qiao, Yanyuan ^{[2
]}

Lu, Hanqing ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] JD Com, Business Growth BU, Intelligent Advertising Lab, Beijing, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 90卷

基金：

中国国家自然科学基金;

关键词：

Visual question answering; Coherent dropout; Siamese dropout; Enhanced question encoder; NETWORKS;

D O I：

10.1016/j.patcog.2019.01.038

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, the current way to use dropout in multi-path networks may cause two problems: the co-adaptations of neurons and the explosion of output variance. In this paper, we propose coherent dropout and siamese dropout mechanism to solve the two problems, respectively. Specifically, in coherent dropout, the relevant dropout layers in multiple paths are forced to work coherently to maximize the ability of preventing neuron co-adaptations. We show that the coherent dropout is simple in implementation but very effective to overcome overfitting. As for the explosion of output variance, we develop a siamese dropout mechanism to explicitly minimize the difference between the two output vectors produced from the same input data during training phase. Such mechanism can reduce the gap between training and inference phases and make the VQA model more robust. With the help of the two techniques, we further design an enhanced question encoder called Multi-path Stacked Residual RNNs which is deeper and wider and more powerful than current shallow question encoder. Extensive experiments are conducted to verify the effectiveness of coherent dropout, siamese dropout and the enhanced question encoder. And the results show that our methods can bring clear improvements to the state-of-the-art VQA models on VQA-vl and VQA-v2 datasets. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：404 / 414

页数：11

共 50 条

[31] Visual Question Answering with Question Representation Update (QRU)
Li, Ruiyu
Jia, Jiaya
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[32] Knowledge-Enhanced Medical Visual Question Answering: A Survey
Wang, Haofen
Du, Huifang
[J]. WEB AND BIG DATA. APWEB-WAIM 2022 INTERNATIONAL WORKSHOPS, KGMA 2022, SEMIBDMA 2022, DEEPLUDA 2022, 2023, 1784 : 3 - 9
[33] Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering
Gao, Ling
Zhang, Hongda
Sheng, Nan
Shi, Lida
Xu, Hao
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[34] Improving passage retrieval in question answering using NLP
Tiedemann, J
[J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3808 : 634 - 646
[35] Improving question answering using named entity recognition
Toral, A
Noguera, E
Llopis, F
Muñoz, R
[J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 181 - 191
[36] Lightweight Visual Question Answering using Scene Graphs
Nuthalapati, Sai Vidyaranya
Chandradevan, Ramraj
Giunchiglia, Eleonora
Li, Bowen
Kayser, Maxime
Lukasiewicz, Thomas
Yang, Carl
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3353 - 3357
[37] Improving Question Retrieval in Community Question Answering with Label Ranking
Wang, Wei
Li, Baichuan
King, Irwin
[J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 349 - 356
[38] Improving Question Analysis for Arabic Question Answering in the Medical Domain
Dardour, Sondes
Fehri, Hela
Haddar, Kais
[J]. COMPUTACION Y SISTEMAS, 2022, 26 (03): : 1233 - 1241
[39] Multiple answers to a question: a new approach for visual question answering
Hosseinabad, Sayedshayan Hashemi
Safayani, Mehran
Mirzaei, Abdolreza
[J]. VISUAL COMPUTER, 2021, 37 (01): : 119 - 131
[40] Question-Guided Hybrid Convolution for Visual Question Answering
Gao, Peng
Li, Hongsheng
Li, Shuang
Lu, Pan
Li, Yikang
Hoi, Steven C. H.
Wang, Xiaogang
[J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501

← 1 2 3 4 5 →