Resolving Zero-Shot and Fact-Based Visual Question Answering via Enhanced Fact Retrieval

被引:2
|
作者
Wu, Sen [1 ]
Zhao, Guoshuai [1 ,2 ]
Qian, Xueming [2 ,3 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Shaanxi Yulan Jiuzhou Intelligent Optoelect Techno, Xian 710049, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Peoples R China
[4] Xi An Jiao Tong Univ, SMILES LAB, Xian 710049, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Knowledge based systems; Question answering (information retrieval); Predictive models; Knowledge graphs; Feature extraction; Visual question answering; zero-shot; knowledge graph;
D O I
10.1109/TMM.2023.3289729
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Practical applications with visual question answering (VQA) systems are challenging, and recent research has aimed at investigating this important field. Many issues related to real-world VQA applications must be considered. Although existing methods have focused on adding external knowledge and other descriptive information to assist in reasoning, they are limited by the impact of information retrieval errors on downstream tasks and the misalignment of the aggregated information. Thus, the overall performance of these models must be improved. To address these challenges, we propose a novel VQA model that utilizes a differentiated pretrained model to represent the input information and connects the input data with three external knowledge components through a common feature space. To combine the information in the three feature spaces, we propose an information aggregation strategy that employs a weighted score to aggregate the information in the relation and entity spaces in the answer prediction process. The experimental results show that our method achieves good performance in fact-based and zero-shot VQA tasks and achieves state-of-the-art performance with the ZS-F-VQA dataset.
引用
收藏
页码:1790 / 1800
页数:11
相关论文
共 50 条
  • [21] ZVQAF: Zero-shot visual question answering with feedback from large language models
    Liu, Cheng
    Wang, Chao
    Peng, Yan
    Li, Zhixu
    NEUROCOMPUTING, 2024, 580
  • [22] Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study
    Castrillón-Santana M.
    Sánchez-Nielsen E.
    Freire-Obregón D.
    Santana O.J.
    Hernández-Sosa D.
    Lorenzo-Navarro J.
    SN Computer Science, 5 (6)
  • [23] Toward Zero-Shot and Zero-Resource Multilingual Question Answering
    Kuo, Chia-Chih
    Chen, Kuan-Yu
    IEEE ACCESS, 2022, 10 : 99754 - 99761
  • [25] Fact-based similar case retrieval methods based on statutory knowledge
    Li L.
    Wang D.
    Fan H.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (07): : 1357 - 1365
  • [26] S2QL: Retrieval Augmented Zero-Shot Question Answering over Knowledge Graph
    Zan, Daoguang
    Wang, Sirui
    Zhang, Hongzhi
    Yan, Yuanmeng
    Wu, Wei
    Guan, Bei
    Wang, Yongji
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 223 - 236
  • [27] DiscoLQA: zero-shot discourse-based legal question answering on European Legislation
    Sovrano, Francesco
    Palmirani, Monica
    Sapienza, Salvatore
    Pistone, Vittoria
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,
  • [28] Zero-Shot Commonsense Question Answering with Cloze Translation and Consistency Optimization
    Dou, Zi-Yi
    Peng, Nanyun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10572 - 10580
  • [29] From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
    Guo, Jiaxian
    Li, Junnan
    Li, Dongxu
    Tiong, Anthony Meng Huat
    Li, Boyang
    Tao, Dacheng
    Hoi, Steven
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10867 - 10877
  • [30] Zero-Shot Learning via Visual Abstraction
    Antol, Stanislaw
    Zitnick, C. Lawrence
    Parikh, Devi
    COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 : 401 - 416