VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引:23
|
作者
Ravi, Sahithya [1 ,2 ]
Chinchure, Aditya [1 ,2 ]
Sigal, Leonid [1 ,2 ]
Liao, Renjie [1 ]
Shwartz, Vered [1 ,2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] Vector Inst AI, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/WACV56688.2023.00121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [31] Learning to Specialize with Knowledge Distillation for Visual Question Answering
    Mun, Jonghwan
    Lee, Kimin
    Shin, Jinwoo
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [32] KVQA: Knowledge-Aware Visual Question Answering
    Shah, Sanket
    Mishra, Anand
    Yadati, Naganand
    Talukdar, Partha Pratim
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884
  • [33] Increasing Interpretability in Outside Knowledge Visual Question Answering
    Upravitelev, Max
    Krauss, Christopher
    Kuhlmann, Isabelle
    KNOWLEDGE MANAGEMENT IN ORGANISATIONS, KMO 2024, 2024, 2152 : 319 - 330
  • [34] INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE
    Zhu, He
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2140 - 2144
  • [35] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [36] Question Answering Mediated by Visual Clues and Knowledge Graphs
    de Faria, Fabricio F.
    Usbeck, Ricardo
    Sarullo, Alessio
    Mu, Tingting
    Freitas, Andre
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1937 - 1939
  • [37] Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
    Lv, Shangwen
    Guo, Daya
    Xu, Jingjing
    Tang, Duyu
    Duan, Nan
    Gong, Ming
    Shou, Linjun
    Jiang, Daxin
    Cao, Guihong
    Hu, Songlin
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8449 - 8456
  • [38] Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering
    Tuan-Phong Nguyen
    Razniewski, Simon
    Weikum, Gerhard
    ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 40 - 47
  • [39] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [40] CSA-BERT: Video Question Answering
    Jenni, Kommineni
    Srinivas, M.
    Sannapu, Roshni
    Perumal, Murukessan
    2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 532 - 536