Visual-Semantic Dual Channel Network for Visual Question Answering

被引:0
|
作者
Wang, Xin [1 ]
Chen, Qiaohong [1 ]
Hu, Ting [1 ]
Sun, Qi [1 ]
Jia, Yubo [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat & Technol, Hangzhou 310018, Peoples R China
关键词
D O I
10.1109/IJCNN52387.2021.9533855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the existing visual question answering (VQA) models based on the attention mechanism have achieved state-of-art results. However, attention-based networks only rely on question guidance to capture relevant image features, ignoring the high-level semantic information of the image. Therefore, the key challenge of the VQA task lies in obtaining effective semantic embedding and fine-grained visual understanding during the reasoning process. In this research, we propose a novel visual-semantic dual channel network to answer related questions from both visual and semantic perspectives. Specifically, the visual channel uses the relational reasoning method with an attention mechanism to capture visual objects and their relations, while the semantic channel can capture high-level semantic information from the global and local image by the semantic attention module. We confirmed the effectiveness of the proposed model and each module through extensive experiments on two versions of VQA datasets. Interpretability shows that the visual-semantic dual channel network can dynamically model to infer the most relevant answer to the question.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Image Captioning with Visual-Semantic LSTM
    Li, Nannan
    Chen, Zhenzhong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 793 - 799
  • [32] Learning Robust Visual-Semantic Embeddings
    Tsai, Yao-Hung Hubert
    Huang, Liang-Kang
    Salakhutdinov, Ruslan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3591 - 3600
  • [33] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [34] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [35] Visual Relationship Detection Using Joint Visual-Semantic Embedding
    Li, Binglin
    Wang, Yang
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3291 - 3296
  • [36] Training Visual-Semantic Embedding Network for Boosting Automatic Image Annotation
    Zhang, Weifeng
    Hu, Hua
    Hu, Haiyang
    [J]. NEURAL PROCESSING LETTERS, 2018, 48 (03) : 1503 - 1519
  • [37] A Visual Question Answering Network Merging High- and Low-Level Semantic Information
    Li, Huimin
    Han, Dezhi
    Chen, Chongqing
    Chang, Chin-chen
    Li, Kuan-ching
    Li, Dun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 581 - 589
  • [38] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [39] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [40] Visual-Semantic Aligned Bidirectional Network for Zero-Shot Learning
    Gao, Rui
    Hou, Xingsong
    Qin, Jie
    Shen, Yuming
    Long, Yang
    Liu, Li
    Zhang, Zhao
    Shao, Ling
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1649 - 1664