DSGEM: Dual scene graph enhancement module-based visual question answering

被引:1
|
作者
Wang, Boyue [1 ]
Ma, Yujian [1 ]
Li, Xiaoyan [1 ]
Liu, Heng [1 ]
Hu, Yongli [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
image representation; question answering (information retrieval); LANGUAGE;
D O I
10.1049/cvi2.12186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention-based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph-based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficiency of visual edge relations. Besides, the scene graph of text modality is often ignored in these works. In this article, a novel Dual Scene Graph Enhancement Module (DSGEM) is proposed that exploits the relevant external knowledge to simultaneously construct two interpretable scene graph structures of image and text modalities, which makes the reasoning process more logical and precise. Specifically, the authors respectively build the visual and textual scene graphs with the help of commonsense knowledge and syntactic structure, which explicitly endows the specific semantics to each edge relation. Then, two scene graph enhancement modules are proposed to propagate the involved external and structural knowledge to explicitly guide the feature interaction between objects (nodes). Finally, the authors embed such two scene graph enhancement modules to existing VQA models to introduce the explicit relation reasoning ability. Experimental results on both VQA V2 and OK-VQA datasets show that the proposed DSGEM is effective and compatible to various VQA architectures.
引用
下载
收藏
页码:638 / 651
页数:14
相关论文
共 50 条
  • [41] Graph-based Question Answering System
    Mital, Piyush
    Agrawal, Saurabh
    Neti, Bhargavi
    Haribhakta, Yashodhara
    Kamble, Vibhavari
    Bhattacharjee, Krishnanjan
    Das, Debashri
    Mehta, Swati
    Kumar, Ajai
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1798 - 1802
  • [42] Fusing Multi-graph Structures for Visual Question Answering
    Hu, Yuncong
    Zhang, Ru
    Liu, Jianyi
    Yan, Dong
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2023, 19 : 13 - 13
  • [43] Syntax Tree Constrained Graph Network for Visual Question Answering
    Su, Xiangrui
    Zhang, Qi
    Shi, Chongyang
    Liu, Jiachang
    Hu, Liang
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V, 2024, 14451 : 122 - 136
  • [44] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [45] Improving visual question answering by combining scene-text information
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12177 - 12208
  • [46] Knowledge Graph Embedding Based Question Answering
    Huang, Xiao
    Zhang, Jingyuan
    Li, Dingcheng
    Li, Ping
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 105 - 113
  • [47] An Empirical Study of Multilingual Scene-Text Visual Question Answering
    Li, Lin
    Zhang, Haohan
    Fang, Zeqing
    PROCEEDINGS OF THE 2ND WORKSHOP ON USER-CENTRIC NARRATIVE SUMMARIZATION OF LONG VIDEOS, NARSUM 2023, 2023, : 3 - 8
  • [48] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [49] Knowledge Graph Based Question Routing for Community Question Answering
    Liu, Zhu
    Li, Kan
    Qu, Dacheng
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 721 - 730
  • [50] Improving visual question answering by combining scene-text information
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 12177 - 12208