Answering knowledge-based visual questions via the exploration of Question Purpose

被引:7
|
作者
Song, Lingyun [1 ,2 ]
Li, Jianao [1 ,2 ]
Liu, Jun [3 ,4 ]
Yang, Yang [5 ]
Shang, Xuequn [1 ,2 ]
Sun, Mingxuan [6 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Peoples R China
[2] Northwestern Polytech Univ, Key Lab Big Data Storage & Management, Minist Ind & Informat Technol, Xian 710129, Peoples R China
[3] SPKLSTN Lab, Dept Comp Sci & Technol, Xian 710049, Peoples R China
[4] Jiaotong Univ, Xian 710049, Peoples R China
[5] Univ Elect Sci & Technol, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[6] Louisiana State Univ, Sch Elect Engn & Comp Sci, Div Comp Sci & Engn, Baton Rouge, LA 70803 USA
关键词
Visual question answering; DNN; Question Purpose;
D O I
10.1016/j.patcog.2022.109015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering has been greatly advanced by deep learning technologies, but still remains an open problem subjected to two aspects of factors. First, previous works estimate the correctness of each candidate answer mainly by its semantic correlations with visual questions, overlooking the fact that some questions and their answers are semantically inconsistent. Second, previous works that require external knowledge mainly uses the knowledge facts retrieved by key words or visual objects. However, the retrieved knowledge facts may only be related to the semantics of the question, but are useless or even misleading for answer prediction. To address these issues, we investigate how to capture the pur-pose of visual questions and propose a Purpose Guided Visual Question Answering model, called PGVQA. It mainly has two appealing properties: (1) It can estimate the correctness of candidate answers based on the Question Purpose (QP) that reveals which aspects of the concept are examined by visual questions. This is helpful for avoiding the negative effect of the semantic inconsistency between answers and ques-tions. (2) It can incorporate the knowledge facts accordant with the QP into answer prediction, which helps to improve the probability of answering visual questions correctly. Empirical studies on benchmark datasets show that PGVQA achieves state-of-the-art performance.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Asking Clarification Questions in Knowledge-Based Question Answering
    Xu, Jingjing
    Wang, Yuechen
    Tang, Duyu
    Duan, Nan
    Yang, Pengcheng
    Zeng, Qi
    Zhou, Ming
    Sun, Xu
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1618 - 1629
  • [2] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [3] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [4] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [5] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [6] Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
    Garcia-Olano, Diego
    Onoe, Yasumasa
    Ghosh, Joydeep
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 705 - 715
  • [7] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [8] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [9] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [10] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108