A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering

被引:0
|
作者
You, Jiuxiang [1 ]
Yang, Zhenguo [1 ]
Li, Qing [2 ]
Liu, Wenyin [1 ]
机构
[1] Guangdong Univ Technol, Sch Comp Sci & Technol, Guangzhou, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
VQA; Knowledge graph; Entity linking;
D O I
10.1109/ICME55011.2023.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a Retriever-Reader framework with Visual Entity Linking (RR-VEL) for knowledge-based visual question answering. Given images and original questions, the visual entity linking (VEL) module extracts key entities in images to replace the question referents for semantic disambiguation, achieving entity-oriented queries with explicit entities. Furthermore, the Retriever encodes the queries and knowledge items by Bert with a feed-forward layer, and obtains a set of knowledge candidates. The Reader encodes the questions with image captions and knowledge candidates in two branches, which avoids their interference during self-attentive encoding. Finally, the decoder of Reader fuses the encoded features to generate answers. Extensive experiments conducted on the two public datasets show that our method significantly outperforms the existing baselines.
引用
收藏
页码:13 / 18
页数:6
相关论文
共 50 条
  • [1] Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
    Luo, Man
    Zeng, Yankai
    Banerjee, Pratyay
    Baral, Chitta
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6417 - 6431
  • [2] A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
    Guo, Yangyang
    Nie, Liqiang
    Wong, Yongkang
    Liu, Yibing
    Cheng, Zhiyong
    Kankanhalli, Mohan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2061 - 2069
  • [3] Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
    Garcia-Olano, Diego
    Onoe, Yasumasa
    Ghosh, Joydeep
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 705 - 715
  • [4] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [5] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [6] Exploring Retriever-Reader Approaches in Question-Answering on Scientific Documents
    Dieu-Hien Nguyen
    Nguyen-Khang Le
    Minh Le Nguyen
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 383 - 395
  • [7] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [8] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [9] The Core of Smart Cities: Knowledge Representation and Descriptive Framework Construction in Knowledge-Based Visual Question Answering
    Wang, Ruiping
    Wu, Shihong
    Wang, Xiaoping
    [J]. SUSTAINABILITY, 2022, 14 (20)
  • [10] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108