Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

被引:0
|
作者
Luo, Man [1 ]
Zeng, Yankai [1 ]
Banerjee, Pratyay [1 ]
Baral, Chitta [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in this link.
引用
收藏
页码:6417 / 6431
页数:15
相关论文
共 50 条
  • [1] Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
    Heo, Yu-Jung
    Kim, Eun-Sol
    Choi, Woo Suk
    Zhang, Byoung-Tak
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 373 - 390
  • [2] A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering
    You, Jiuxiang
    Yang, Zhenguo
    Li, Qing
    Liu, Wenyin
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 13 - 18
  • [3] Effective Search of Logical Forms for Weakly Supervised Knowledge-Based Question Answering
    Shen, Tao
    Geng, Xiubo
    Long, Guodong
    Jiang, Jing
    Zhang, Chengqi
    Jiang, Daxin
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2227 - 2233
  • [4] Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering
    Li, Hao
    Huang, Jinfa
    Jin, Peng
    Song, Guoli
    Wu, Qi
    Chen, Jie
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3367 - 3382
  • [5] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
    Khan, Aisha Urooj
    Kuehne, Hilde
    Duarte, Kevin
    Gan, Chuang
    Lobo, Niels
    Shah, Mubarak
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8461 - 8470
  • [6] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [7] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [8] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [9] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [10] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208