VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引：23

作者：

Ravi, Sahithya ^{[1
,2
]}

Chinchure, Aditya ^{[1
,2
]}

Sigal, Leonid ^{[1
,2
]}

Liao, Renjie ^{[1
]}

Shwartz, Vered ^{[1
,2
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

[2] Vector Inst AI, Toronto, ON, Canada

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1109/WACV56688.2023.00121

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT

引用

页码：1155 / 1165

页数：11

共 50 条

[41] Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge
Sun, Kai
Yu, Dian
Chen, Jianshu
Yu, Dong
Cardie, Claire
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8736 - 8747
[42] Natural Intelligence - Commonsense Question Answering with Conceptual Graphs
Guler, Fatih Mehmet
Birturk, Aysenur
CONCEPTUAL STRUCTURES: FROM INFORMATION TO INTELLIGENCE, 2010, 6208 : 97 - 107
[43] Marie and BERT-A Knowledge Graph Embedding Based Question Answering System for Chemistry
Zhou, Xiaochi
Zhang, Shaocong
Agarwal, Mehal
Akroyd, Jethro
Mosbach, Sebastian
Kraft, Markus
ACS OMEGA, 2023, 8 (36): : 33039 - 33057
[44] PathReasoner: Explainable reasoning paths for commonsense question answering
Zhan, Xunlin
Huang, Yinya
Dong, Xiao
Cao, Qingxing
Liang, Xiaodan
KNOWLEDGE-BASED SYSTEMS, 2022, 235
[45] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[46] Unsupervised Commonsense Question Answering with Self-Talk
Shwartz, Vered
West, Peter
Le Bras, Ronan
Bhagavatula, Chandra
Choi, Yejin
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4615 - 4629
[47] Joint Answering and Explanation for Visual Commonsense Reasoning
Li, Zhenyang
Guo, Yangyang
Wang, Kejie
Wei, Yinwei
Nie, Liqiang
Kankanhalli, Mohan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3836 - 3846
[48] A Diagrammatic Approach for Visual Question Answering over Knowledge Graphs
Mouromtsev, Dmitry
Wohlgenannt, Gerhard
Haase, Peter
Pavlov, Dmitry
Emelyanov, Yury
Morozov, Alexey
SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 34 - 39
[49] Explicit Knowledge-based Reasoning for Visual Question Answering
Wang, Peng
Wu, Qi
Shen, Chunhua
Dick, Anthony
van den Hengel, Anton
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
[50] PathReasoner: Explainable reasoning paths for commonsense question answering
Zhan, Xunlin
Huang, Yinya
Dong, Xiao
Cao, Qingxing
Liang, Xiaodan
Knowledge-Based Systems, 2022, 235

← 1 2 3 4 5 →