DSGEM: Dual scene graph enhancement module-based visual question answering

被引:1
|
作者
Wang, Boyue [1 ]
Ma, Yujian [1 ]
Li, Xiaoyan [1 ]
Liu, Heng [1 ]
Hu, Yongli [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
image representation; question answering (information retrieval); LANGUAGE;
D O I
10.1049/cvi2.12186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention-based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph-based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficiency of visual edge relations. Besides, the scene graph of text modality is often ignored in these works. In this article, a novel Dual Scene Graph Enhancement Module (DSGEM) is proposed that exploits the relevant external knowledge to simultaneously construct two interpretable scene graph structures of image and text modalities, which makes the reasoning process more logical and precise. Specifically, the authors respectively build the visual and textual scene graphs with the help of commonsense knowledge and syntactic structure, which explicitly endows the specific semantics to each edge relation. Then, two scene graph enhancement modules are proposed to propagate the involved external and structural knowledge to explicitly guide the feature interaction between objects (nodes). Finally, the authors embed such two scene graph enhancement modules to existing VQA models to introduce the explicit relation reasoning ability. Experimental results on both VQA V2 and OK-VQA datasets show that the proposed DSGEM is effective and compatible to various VQA architectures.
引用
收藏
页码:638 / 651
页数:14
相关论文
共 50 条
  • [1] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [2] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [3] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [4] Scene Text Visual Question Answering
    Biten, Ali Furkan
    Tito, Ruben
    Mafla, Andres
    Gomez, Lluis
    Rusinol, Marcal
    Valveny, Ernest
    Jawahar, C. V.
    Karatzas, Dimosthenis
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300
  • [5] Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering
    Koner, Rajat
    Li, Hang
    Hildebrandt, Marcel
    Das, Deepan
    Tresp, Volker
    Guennemann, Stephan
    [J]. SEMANTIC WEB - ISWC 2021, 2021, 12922 : 111 - 127
  • [6] SceneGATE: Scene-Graph Based Co-Attention Networks for Text Visual Question Answering
    Cao, Feiqi
    Luo, Siwen
    Nunez, Felipe
    Wen, Zean
    Poon, Josiah
    Han, Soyeon Caren
    [J]. ROBOTICS, 2023, 12 (04)
  • [7] Module-based graph pooling for graph classification
    Deng, Sucheng
    Yang, Geping
    Yang, Yiyang
    Gong, Zhiguo
    Chen, Can
    Chen, Xiang
    Hao, Zhifeng
    [J]. PATTERN RECOGNITION, 2024, 154
  • [8] Question-aware dynamic scene graph of local semantic representation learning for visual question answering
    Wu, Jinmeng
    Ge, Fulin
    Hong, Hanyu
    Shi, Yu
    Hao, Yanbin
    Ma, Lei
    [J]. PATTERN RECOGNITION LETTERS, 2023, 170 : 93 - 99
  • [9] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [10] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124