Sequential Visual Reasoning for Visual Question Answering

被引:0
|
作者
Liu, Jinlai [1 ]
Wu, Chenfei [1 ]
Wang, Xiaojie [1 ]
Dong, Xuan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Ctr Intelligence Sci & Technol, Beijing, Peoples R China
基金
中国国家社会科学基金;
关键词
Visual Question Answering; Sequential Reasoning; Bilinear Model;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual question answering (VQA) is a challenging task which addressing the learning and reasoning at the intersection of vision and language. This reasoning requires both understanding sequential and compositional linguistic structure from questions and sets of visual objects and their spatial relation from images. Previous research mainly focuses on the improvement of attention mechanisms and optimization of multi-modal bilinear fusion, which only support one-step or static reasoning about visual features. The lack of complex cross-modal reasoning methods limits the expression of proposed VQA models. This paper introduces a novel Sequential Visual Reasoning (SVR) model to manipulate both the sequential language understanding and spatial visual reasoning by constructing visual reasoning procedures sequentially. In the SVR module, the squeeze stage generates the most relevant of visual object under the guidance of question, and the expand stage updates the visual objects by interacting with the most relevant object. Experimental results on the four publicly available datasets demonstrate that our proposed model significantly outperforms previously proposed attention-based or bilinear fusion VQA models. The visualization of the sequential visual reasoning illustrates the progress that the SVR model can sequentially focus on different visual object according to the question which finally infers the answer of the question.
引用
收藏
页码:410 / 415
页数:6
相关论文
共 50 条
  • [1] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [2] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Improving reasoning with contrastive visual information for visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Yu, Jian
    [J]. ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760
  • [4] Visual question answering by pattern matching and reasoning
    Zhan, Huayi
    Xiong, Peixi
    Wang, Xin
    Yang, Lan
    [J]. NEUROCOMPUTING, 2022, 467 : 323 - 336
  • [5] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    [J]. Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [7] Coarse-to-Fine Reasoning for Visual Question Answering
    Nguyen, Binh X.
    Tuong Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    Anh Nguyen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4557 - 4565
  • [8] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [9] Interpretable Visual Question Answering by Reasoning on Dependency Trees
    Cao, Qingxing
    Liang, Xiaodan
    Li, Bailin
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 887 - 901
  • [10] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    [J]. 2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230