Let Me Show You Step by Step: An Interpretable Graph Routing Network for Knowledge-based Visual Question Answering

被引:0
|
作者
Wang, Duokang [1 ]
Hu, Linmei [2 ]
Hao, Rui [1 ]
Shao, Yingxia [1 ]
Lv, Xin [3 ]
Nie, Liqiang [4 ]
Li, Juanzi [3 ]
机构
[1] Beijing Univ Posts & Telecommun, SCS, Beijing 100876, Peoples R China
[2] Beijing Inst Technol, SCST, Beijing 100081, Peoples R China
[3] Tsinghua Univ, DCST, Beijing 100084, Peoples R China
[4] Harbin Inst Technol Shenzhen, SCST, Shenzhen 518055, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
Knowledge-based Visual Question Answering; Scene Knowledge Graph; Graph Routing Network; LANGUAGE;
D O I
10.1145/3626772.3657790
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering based on external Knowledge Bases (KB-VQA) requires a model to incorporate knowledge beyond the content of given image and question for answer prediction. Most existing works made efforts on using graph neural networks or Multi-modal Large Language Models to incorporate external knowledge for answer generation. Despite the promising results, they have limited interpretability and exhibit a deficiency in handling questions with unseen answers. In this paper, we propose a novel interpretable graph routing network (GRN) which explicitly conducts entity routing over a constructed scene knowledge graph step by step for KB-VQA. At each step, GRN keeps an entity score vector representing how likely of each entity to be activated as the answer, and a transition matrix representing the transition probability from one entity to another. To answer the given question, GRN will focus on certain keywords of the question at each step and correspondingly conduct entity routing by transiting the entity scores according to the transition matrix computed referring to the focused question keywords. In this way, it clearly provides the reasoning process of KB-VQA and can handle the questions with unseen answers without distinction. Experiments on the benchmark dataset KRVQA have demonstrated that GRN improves the performance of KB-VQA by a large margin, surpassing existing state-of-the art KB-VQA methods and Multi-modal Large Language Models, as well as shows competent capability in handling unseen answers and good interpretability in KB-VQA.
引用
收藏
页码:1984 / 1994
页数:11
相关论文
共 50 条
  • [1] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [2] Dynamic Key-Value Memory Enhanced Multi-Step Graph Reasoning for Knowledge-Based Visual Question Answering
    Li, Mingxiao
    Moens, Marie-Francine
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10983 - 10992
  • [3] Knowledge Graph Based Question Routing for Community Question Answering
    Liu, Zhu
    Li, Kan
    Qu, Dacheng
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 721 - 730
  • [4] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [5] Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph
    Jiang, Lei
    Meng, Zuqiang
    [J]. ELECTRONICS, 2023, 12 (06)
  • [6] Medical knowledge-based network for Patient-oriented Visual Question Answering
    Jian, Huang
    Chen, Yihao
    Yong, Li
    Yang, Zhenguo
    Gong, Xuehao
    Lee, Wang Fu
    Xu, Xiaohong
    Liu, Wenyin
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [7] Graph Matching Network for Interpretable Complex Question Answering over Knowledge Graphs
    Sun, Yawei
    Cheng, Gong
    Li, Xiao
    Qu, Yuzhong
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (12): : 2673 - 2683
  • [8] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [9] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [10] Two-step Joint Attention Network for Visual Question Answering
    Zhang, Weiming
    Zhang, Chunhong
    Liu, Pei
    Zhan, Zhiqiang
    Qiu, Xiaofeng
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 136 - 143