Vision-Language-Knowledge Co-Embedding for Visual Commonsense Reasoning

被引：5

作者：

Lee, JaeYun ^{[1
]}

Kim, Incheol ^{[1
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Suwon 16227, South Korea

来源：

SENSORS | 2021年 / 21卷 / 09期

关键词：

visual commonsense reasoning; multimodal co-embedding; knowledge graph; graph convolutional network; pretrained multi-head self-attention network;

D O I：

10.3390/s21092911

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Visual commonsense reasoning is an intelligent task performed to decide the most appropriate answer to a question while providing the rationale or reason for the answer when an image, a natural language question, and candidate responses are given. For effective visual commonsense reasoning, both the knowledge acquisition problem and the multimodal alignment problem need to be solved. Therefore, we propose a novel Vision-Language-Knowledge Co-embedding (ViLaKC) model that extracts knowledge graphs relevant to the question from an external knowledge base, ConceptNet, and uses them together with the input image to answer the question. The proposed model uses a pretrained vision-language-knowledge embedding module, which co-embeds multimodal data including images, natural language texts, and knowledge graphs into a single feature vector. To reflect the structural information of the knowledge graph, the proposed model uses the graph convolutional neural network layer to embed the knowledge graph first and then uses multi-head self-attention layers to co-embed it with the image and natural language question. The effectiveness and performance of the proposed model are experimentally validated using the VCR v1.0 benchmark dataset.

引用

页数：19

共 34 条

[21] KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning[Formula presented]
Song, Dandan
Ma, Siyi
Sun, Zhanchen
Yang, Sicheng
Liao, Lejian
Knowledge-Based Systems, 2021, 230
[22] Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Cao, Qingxing
Li, Bailin
Liang, Xiaodan
Wang, Keze
Lin, Liang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) : 2758 - 2767
[23] Dynamic Heterogeneous-Graph Reasoning with Language Models and Knowledge Representation Learning for Commonsense Question Answering
Wang, Yujie
Zhang, Hu
Liang, Jiye
Li, Ru
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14048 - 14063
[24] PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge
Seo, Jaehyung
Oh, Dongsuk
Eo, Sugyeong
Park, Chanjun
Yang, Kisu
Moon, Hyeonseok
Park, Kinam
Lim, Heuiseok
KNOWLEDGE-BASED SYSTEMS, 2022, 256
[25] Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Marasovic, Ana
Bhagavatula, Chandra
Park, Jae Sung
Le Bras, Ronan
Smith, Noah A.
Choi, Yejin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2810 - 2829
[26] Incorporating External Knowledge Reasoning for Vision-and-Language Navigation with Assistant's Help
Li, Xin
Zhang, Yu
Yuan, Weilin
Luo, Junren
APPLIED SCIENCES-BASEL, 2022, 12 (14):
[27] Transformer with convolution and graph-node co-embedding: An accurate and interpretable vision backbone for predicting gene expressions from local histopathological image
Xiao, Xiao
Kong, Yan
Li, Ronghan
Wang, Zuoheng
Lu, Hui
MEDICAL IMAGE ANALYSIS, 2024, 91
[28] CAT-ViL: Co-attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Bai, Long
Islam, Mobarakol
Ren, Hongliang
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 397 - 407
[29] KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue
Jiang, Xiaoze
Du, Siyi
Qin, Zengchang
Sun, Yajing
Yu, Jing
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1265 - 1273
[30] Professional vision in the classroom: Teachers? knowledge-based reasoning explaining their visual focus of attention to students
Muhonen, Heli
Pakarinen, Eija
Lerkkanen, Marja-Kristiina
TEACHING AND TEACHER EDUCATION, 2023, 121

← 1 2 3 4 →