DSGEM: Dual scene graph enhancement module-based visual question answering

被引:1
|
作者
Wang, Boyue [1 ]
Ma, Yujian [1 ]
Li, Xiaoyan [1 ]
Liu, Heng [1 ]
Hu, Yongli [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
image representation; question answering (information retrieval); LANGUAGE;
D O I
10.1049/cvi2.12186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention-based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph-based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficiency of visual edge relations. Besides, the scene graph of text modality is often ignored in these works. In this article, a novel Dual Scene Graph Enhancement Module (DSGEM) is proposed that exploits the relevant external knowledge to simultaneously construct two interpretable scene graph structures of image and text modalities, which makes the reasoning process more logical and precise. Specifically, the authors respectively build the visual and textual scene graphs with the help of commonsense knowledge and syntactic structure, which explicitly endows the specific semantics to each edge relation. Then, two scene graph enhancement modules are proposed to propagate the involved external and structural knowledge to explicitly guide the feature interaction between objects (nodes). Finally, the authors embed such two scene graph enhancement modules to existing VQA models to introduce the explicit relation reasoning ability. Experimental results on both VQA V2 and OK-VQA datasets show that the proposed DSGEM is effective and compatible to various VQA architectures.
引用
收藏
页码:638 / 651
页数:14
相关论文
共 50 条
  • [21] Cascading Attention Visual Question Answering Model Based on Graph Structure
    Zhang, Haoyu
    Zhang, De
    Computer Engineering and Applications, 2023, 59 (06) : 155 - 161
  • [22] Feature Enhancement in Attention for Visual Question Answering
    Lin, Yuetan
    Pang, Zhangyang
    Wang, Donghui
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
  • [23] Graph-Structured Representations for Visual Question Answering
    Teney, Damien
    Liu, Lingqiao
    van den Hengel, Anton
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3233 - 3241
  • [24] Scene text visual question answering by using YOLO and STN
    Nourali K.
    Dolkhani E.
    International Journal of Speech Technology, 2024, 27 (01) : 69 - 76
  • [25] Towards Reasoning Ability in Scene Text Visual Question Answering
    Wang, Qingqing
    Xiao, Liqiang
    Lu, Yue
    Jin, Yaohui
    He, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2281 - 2289
  • [26] Scene Understanding for Autonomous Driving Using Visual Question Answering
    Wantiez, Adrien
    Qiu, Tianming
    Matthes, Stefan
    Shen, Hao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [27] Dynamic dual graph networks for textbook question answering
    Wang, Yaxian
    Liu, Jun
    Ma, Jie
    Zeng, Hongwei
    Zhang, Lingling
    Li, Junjun
    PATTERN RECOGNITION, 2023, 139
  • [28] DynGraph: Visual Question Answering via Dynamic Scene Graphs
    Haurilet, Monica
    Al-Halah, Ziad
    Stiefelhagen, Rainer
    PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 428 - 441
  • [29] Visual question answering model based on graph neural network and contextual attention
    Sharma, Himanshu
    Jalal, Anand Singh
    IMAGE AND VISION COMPUTING, 2021, 110
  • [30] Visual question answering based on local-scene-aware referring expression generation
    Kim, Jung-Jun
    Lee, Dong-Gyu
    Wu, Jialin
    Jung, Hong-Gyu
    Lee, Seong-Whan
    NEURAL NETWORKS, 2021, 139 (139) : 158 - 167