Visual-Semantic Dual Channel Network for Visual Question Answering

被引:0
|
作者
Wang, Xin [1 ]
Chen, Qiaohong [1 ]
Hu, Ting [1 ]
Sun, Qi [1 ]
Jia, Yubo [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat & Technol, Hangzhou 310018, Peoples R China
关键词
D O I
10.1109/IJCNN52387.2021.9533855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the existing visual question answering (VQA) models based on the attention mechanism have achieved state-of-art results. However, attention-based networks only rely on question guidance to capture relevant image features, ignoring the high-level semantic information of the image. Therefore, the key challenge of the VQA task lies in obtaining effective semantic embedding and fine-grained visual understanding during the reasoning process. In this research, we propose a novel visual-semantic dual channel network to answer related questions from both visual and semantic perspectives. Specifically, the visual channel uses the relational reasoning method with an attention mechanism to capture visual objects and their relations, while the semantic channel can capture high-level semantic information from the global and local image by the semantic attention module. We confirmed the effectiveness of the proposed model and each module through extensive experiments on two versions of VQA datasets. Interpretability shows that the visual-semantic dual channel network can dynamically model to infer the most relevant answer to the question.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Wang, Weining
    Lu, Hanqing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1678 - 1687
  • [2] Visual-Textual Semantic Alignment Network for Visual Question Answering
    Tian, Weidong
    Zhang, Yuzheng
    He, Bin
    Zhu, Junjun
    Zhao, Zhongqiu
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 259 - 270
  • [3] Visual-semantic network: a visual and semantic enhanced model for gesture recognition
    Yizhe Wang
    Congqi Cao
    Yanning Zhang
    [J]. Visual Intelligence, 1 (1):
  • [4] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [5] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [6] Modular dual-stream visual fusion network for visual question answering
    Xue, Lixia
    Wang, Wenhao
    Wang, Ronggui
    Yang, Juan
    [J]. VISUAL COMPUTER, 2024,
  • [7] Improving Visual Question Answering by Semantic Segmentation
    Pham, Viet-Quoc
    Mishima, Nao
    Nakasu, Toshiaki
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 459 - 470
  • [8] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
    Yu, Dongchen
    Gao, Xing
    Xiong, Hongkai
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
  • [9] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    [J]. IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
  • [10] VSRN: Visual-Semantic Relation Network for Video Visual Relation Inference
    Cao, Qianwen
    Huang, Heyan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 768 - 777