VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引:23
|
作者
Ravi, Sahithya [1 ,2 ]
Chinchure, Aditya [1 ,2 ]
Sigal, Leonid [1 ,2 ]
Liao, Renjie [1 ]
Shwartz, Vered [1 ,2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] Vector Inst AI, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/WACV56688.2023.00121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [41] Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge
    Sun, Kai
    Yu, Dian
    Chen, Jianshu
    Yu, Dong
    Cardie, Claire
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8736 - 8747
  • [42] Natural Intelligence - Commonsense Question Answering with Conceptual Graphs
    Guler, Fatih Mehmet
    Birturk, Aysenur
    CONCEPTUAL STRUCTURES: FROM INFORMATION TO INTELLIGENCE, 2010, 6208 : 97 - 107
  • [43] Marie and BERT-A Knowledge Graph Embedding Based Question Answering System for Chemistry
    Zhou, Xiaochi
    Zhang, Shaocong
    Agarwal, Mehal
    Akroyd, Jethro
    Mosbach, Sebastian
    Kraft, Markus
    ACS OMEGA, 2023, 8 (36): : 33039 - 33057
  • [44] PathReasoner: Explainable reasoning paths for commonsense question answering
    Zhan, Xunlin
    Huang, Yinya
    Dong, Xiao
    Cao, Qingxing
    Liang, Xiaodan
    KNOWLEDGE-BASED SYSTEMS, 2022, 235
  • [45] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [46] Unsupervised Commonsense Question Answering with Self-Talk
    Shwartz, Vered
    West, Peter
    Le Bras, Ronan
    Bhagavatula, Chandra
    Choi, Yejin
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4615 - 4629
  • [47] Joint Answering and Explanation for Visual Commonsense Reasoning
    Li, Zhenyang
    Guo, Yangyang
    Wang, Kejie
    Wei, Yinwei
    Nie, Liqiang
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3836 - 3846
  • [48] A Diagrammatic Approach for Visual Question Answering over Knowledge Graphs
    Mouromtsev, Dmitry
    Wohlgenannt, Gerhard
    Haase, Peter
    Pavlov, Dmitry
    Emelyanov, Yury
    Morozov, Alexey
    SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 34 - 39
  • [49] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [50] PathReasoner: Explainable reasoning paths for commonsense question answering
    Zhan, Xunlin
    Huang, Yinya
    Dong, Xiao
    Cao, Qingxing
    Liang, Xiaodan
    Knowledge-Based Systems, 2022, 235