Resolving Zero-Shot and Fact-Based Visual Question Answering via Enhanced Fact Retrieval

被引:2
|
作者
Wu, Sen [1 ]
Zhao, Guoshuai [1 ,2 ]
Qian, Xueming [2 ,3 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Shaanxi Yulan Jiuzhou Intelligent Optoelect Techno, Xian 710049, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Peoples R China
[4] Xi An Jiao Tong Univ, SMILES LAB, Xian 710049, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Knowledge based systems; Question answering (information retrieval); Predictive models; Knowledge graphs; Feature extraction; Visual question answering; zero-shot; knowledge graph;
D O I
10.1109/TMM.2023.3289729
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Practical applications with visual question answering (VQA) systems are challenging, and recent research has aimed at investigating this important field. Many issues related to real-world VQA applications must be considered. Although existing methods have focused on adding external knowledge and other descriptive information to assist in reasoning, they are limited by the impact of information retrieval errors on downstream tasks and the misalignment of the aggregated information. Thus, the overall performance of these models must be improved. To address these challenges, we propose a novel VQA model that utilizes a differentiated pretrained model to represent the input information and connects the input data with three external knowledge components through a common feature space. To combine the information in the three feature spaces, we propose an information aggregation strategy that employs a weighted score to aggregate the information in the relation and entity spaces in the answer prediction process. The experimental results show that our method achieves good performance in fact-based and zero-shot VQA tasks and achieves state-of-the-art performance with the ZS-F-VQA dataset.
引用
收藏
页码:1790 / 1800
页数:11
相关论文
共 50 条
  • [41] Zero-Shot Sketch Based Image Retrieval via Modality Capacity Guidance
    Zhou, Yanghong
    Liu, Dawei
    Mok, P. Y.
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1780 - 1787
  • [42] A Zero-Shot Framework for Sketch Based Image Retrieval
    Yelamarthi, Sasi Kiran
    Reddy, Shiva Krishna
    Mishra, Ashish
    Mittal, Anurag
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
  • [43] CLAREL: Classification via retrieval loss for zero-shot learning
    Oreshkin, Boris N.
    Rostamzadeh, Negar
    Pinheiro, Pedro O.
    Pal, Christopher
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3989 - 3993
  • [44] Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval
    Ma, Teng
    Organisciak, Daniel
    Ma, Wenbao
    Long, Yang
    ELECTRONICS, 2024, 13 (09)
  • [45] Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval
    Jing, Taotao
    Xia, Haifeng
    Hamm, Jihun
    Ding, Zhengming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3657 - 3668
  • [46] Zero-Shot Rationalization by Multi-Task Transfer Learning from Question Answering
    Kung, Po-Nien
    Yang, Tse-Hsuan
    Chen, Yi-Cheng
    Yin, Sheng-Siang
    Chen, Yun-Nung
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2187 - 2197
  • [47] PEINet: Joint Prompt and Evidence Inference Network via Language Family Policy for Zero-Shot Multilingual Fact Checking
    Li, Xiaoyu
    Wang, Weihong
    Fang, Jifei
    Jin, Li
    Kang, Hankun
    Liu, Chunbo
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [48] Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering
    Ma, Kaixin
    Ilievski, Filip
    Francis, Jonathan
    Bisk, Yonatan
    Nyberg, Eric
    Oltramari, Alessandro
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13507 - 13515
  • [49] Diff-ZsVQA: Zero-shot Visual Question Answering with Frozen Large Language Models Using Diffusion Model
    Xu, Quanxing
    Li, Jian
    Tian, Yuhao
    Zhou, Ling
    Zhang, Feifei
    Huang, Rubing
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [50] Latent Retrieval for Large-Scale Fact-Checking and Question Answering with NLI training
    Samarinas, Chris
    Hsu, Wynne
    Lee, Mong Li
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 941 - 948