Resolving Zero-Shot and Fact-Based Visual Question Answering via Enhanced Fact Retrieval

被引:2
|
作者
Wu, Sen [1 ]
Zhao, Guoshuai [1 ,2 ]
Qian, Xueming [2 ,3 ,4 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Shaanxi Yulan Jiuzhou Intelligent Optoelect Techno, Xian 710049, Peoples R China
[3] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Peoples R China
[4] Xi An Jiao Tong Univ, SMILES LAB, Xian 710049, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Knowledge based systems; Question answering (information retrieval); Predictive models; Knowledge graphs; Feature extraction; Visual question answering; zero-shot; knowledge graph;
D O I
10.1109/TMM.2023.3289729
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Practical applications with visual question answering (VQA) systems are challenging, and recent research has aimed at investigating this important field. Many issues related to real-world VQA applications must be considered. Although existing methods have focused on adding external knowledge and other descriptive information to assist in reasoning, they are limited by the impact of information retrieval errors on downstream tasks and the misalignment of the aggregated information. Thus, the overall performance of these models must be improved. To address these challenges, we propose a novel VQA model that utilizes a differentiated pretrained model to represent the input information and connects the input data with three external knowledge components through a common feature space. To combine the information in the three feature spaces, we propose an information aggregation strategy that employs a weighted score to aggregate the information in the relation and entity spaces in the answer prediction process. The experimental results show that our method achieves good performance in fact-based and zero-shot VQA tasks and achieves state-of-the-art performance with the ZS-F-VQA dataset.
引用
收藏
页码:1790 / 1800
页数:11
相关论文
共 50 条
  • [31] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
    Engin, Deniz
    Avrithis, Yannis
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2796 - 2802
  • [32] CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering
    Wang, Weiqi
    Fang, Tianqing
    Ding, Wenxuan
    Xu, Baixuan
    Li, Xin
    Song, Yangqiu
    Bosselut, Antoine
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13520 - 13545
  • [33] Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
    Riabi, Arij
    Scialom, Thomas
    Keraron, Rachel
    Sagot, Benoit
    Seddah, Djame
    Staiano, Jacopo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7016 - 7030
  • [34] MuHeQA: Zero-shot question answering over multiple and heterogeneous knowledge bases
    Badenes-Olmedo, Carlos
    Corcho, Oscar
    SEMANTIC WEB, 2024, 15 (05) : 1547 - 1561
  • [35] Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation
    Ma, Ji
    Korotkov, Ivan
    Yang, Yinfei
    Hall, Keith
    McDonald, Ryan
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1075 - 1088
  • [36] Chart question answering with multimodal graph representation learning and zero-shot classification
    Farahani, Ali Mazraeh
    Adibi, Peyman
    Ehsani, Mohammad Saeed
    Hutter, Hans-Peter
    Darvishy, Alireza
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [37] Self-Supervised Knowledge Triplet Learning for Zero-Shot Question Answering
    Banerjee, Pratyay
    Baral, Chitta
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 151 - 162
  • [38] Zero-Shot End-To-End Spoken Question Answering In Medical Domain
    Labrak, Yanis
    Moumeni, Adel
    Dufour, Richard
    Rouvier, Mickael
    INTERSPEECH 2024, 2024, : 2020 - 2024
  • [39] Zero-shot Generalization in Dialog State Tracking through Generative Question Answering
    Li, Shuyang
    Cao, Jin
    Sridhar, Mukund
    Zhu, Henghui
    Li, Shang-Wen
    Hamza, Wael
    McAuley, Julian
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1063 - 1074
  • [40] Knowledge Enhanced Zero-Shot Visual Relationship Detection
    Ding, Nan
    Lai, Yong
    Liu, Jie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2024, 2024, 14886 : 3 - 15