VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引：23

作者：

Ravi, Sahithya ^{[1
,2
]}

Chinchure, Aditya ^{[1
,2
]}

Sigal, Leonid ^{[1
,2
]}

Liao, Renjie ^{[1
]}

Shwartz, Vered ^{[1
,2
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

[2] Vector Inst AI, Toronto, ON, Canada

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1109/WACV56688.2023.00121

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT

引用

页码：1155 / 1165

页数：11

共 50 条

[31] Learning to Specialize with Knowledge Distillation for Visual Question Answering
Mun, Jonghwan
Lee, Kimin
Shin, Jinwoo
Han, Bohyung
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[32] KVQA: Knowledge-Aware Visual Question Answering
Shah, Sanket
Mishra, Anand
Yadati, Naganand
Talukdar, Partha Pratim
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884
[33] Increasing Interpretability in Outside Knowledge Visual Question Answering
Upravitelev, Max
Krauss, Christopher
Kuhlmann, Isabelle
KNOWLEDGE MANAGEMENT IN ORGANISATIONS, KMO 2024, 2024, 2152 : 319 - 330
[34] INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE
Zhu, He
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2140 - 2144
[35] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
Hussain, Afzaal
Maqsood, Ifrah
Shahzad, Muhammad
Fraz, Muhammad Moazam
2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
[36] Question Answering Mediated by Visual Clues and Knowledge Graphs
de Faria, Fabricio F.
Usbeck, Ricardo
Sarullo, Alessio
Mu, Tingting
Freitas, Andre
COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1937 - 1939
[37] Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Lv, Shangwen
Guo, Daya
Xu, Jingjing
Tang, Duyu
Duan, Nan
Gong, Ming
Shou, Linjun
Jiang, Daxin
Cao, Guihong
Hu, Songlin
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8449 - 8456
[38] Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering
Tuan-Phong Nguyen
Razniewski, Simon
Weikum, Gerhard
ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 40 - 47
[39] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
Su, Zhenqiang
Gou, Gang
Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
[40] CSA-BERT: Video Question Answering
Jenni, Kommineni
Srinivas, M.
Sannapu, Roshni
Perumal, Murukessan
2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 532 - 536

← 1 2 3 4 5 →