Explaining Vision and Language through Graphs of Events in Space and Time

被引：0

作者：

Masala, Mihai ^{[1
,2
]}

Cudlenco, Nicolae ^{[1
]}

Rebedea, Traian ^{[2
]}

Leordeanu, Marius ^{[1
,2
]}

机构：

[1] Romanian Acad, Inst Math, Bucharest, Romania

[2] Univ Politehn Bucuresti, Bucharest, Romania

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00302

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Artificial Intelligence makes great advances today and starts to bridge the gap between vision and language. However, we are still far from understanding, explaining and controlling explicitly the visual content from a linguistic perspective, because we still lack a common explainable representation between the two domains. In this work we come to address this limitation and propose the Graph of Events in Space and Time (GEST), by which we can represent, create and explain, both visual and linguistic stories. We provide a theoretical justification of our model and an experimental validation, which proves that GEST can bring a solid complementary value along powerful deep learning models. In particular, GEST can help improve at the content-level the generation of videos from text, by being easily incorporated into our novel video generation engine. Additionally, by using efficient graph matching techniques, the GEST graphs can also improve the comparisons between texts at the semantic level.

引用

页码：2818 / 2823

页数：6

共 50 条

[1] Language and the Construction of Time through Space
Boroditsky, Lera
[J]. TRENDS IN NEUROSCIENCES, 2018, 41 (10) : 651 - 653
[2] Time, space, and events in language and cognition: a comparative view
Sinha, Chris
Gardenfors, Peter
[J]. FLOW OF TIME, 2014, 1326 : 72 - 81
[3] Space-Time Graphs Based on Interest Point Tracking for Sign Language
Ximenes, Elias
Pedrini, Helio
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3390 - 3395
[4] Understanding language through vision
Meini, C
Paternoster, A
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 1996, 10 (1-2) : 37 - 48
[5] ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs
Yu, Fei
Tang, Jiji
Yin, Weichong
Su, Yu
Tian, Hao
Wu, Hua
Wang, Haifeng
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3208 - 3216
[6] Explaining language structure through systems interaction
DeLancey, Scott
[J]. FUNCTIONS OF LANGUAGE, 2007, 14 (02) : 277 - 283
[7] The language of space and time
Tenbrink, Thora
[J]. JOURNAL OF PRAGMATICS, 2011, 43 (03) : 691 - 694
[8] Space, time, and language
Corballis, Michael C.
[J]. COGNITIVE PROCESSING, 2018, 19 : S2 - S2
[9] LANGUAGE IN TIME AND SPACE
Videnov, Mihail
[J]. CHUZHDOEZIKOVO OBUCHENIE-FOREIGN LANGUAGE TEACHING, 2013, 40 (02): : 253 - 256
[10] Space, time, and language
Corballis, Michael C.
[J]. COGNITIVE PROCESSING, 2018, 19 : S89 - S92

← 1 2 3 4 5 →