Learning the Effects of Physical Actions in a Multi-modal Environment

被引：0

作者：

Dagan, Gautier ^{[1
]}

Keller, Frank ^{[1
]}

Lascarides, Alex ^{[1
]}

机构：

[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

来源：

17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action's outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model's performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.

引用

下载

页码：133 / 148

页数：16

共 50 条

[31] Multi-modal deep learning for landform recognition
Du, Lin
You, Xiong
Li, Ke
Meng, Liqiu
Cheng, Gong
Xiong, Liyang
Wang, Guangxia
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 158 : 63 - 75
[32] Differentiated Learning for Multi-Modal Domain Adaptation
Lv, Jianming
Liu, Kaijie
He, Shengfeng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1322 - 1330
[33] Multi-modal broad learning for material recognition
Wang, Zhaoxin
Liu, Huaping
Xu, Xinying
Sun, Fuchun
COGNITIVE COMPUTATION AND SYSTEMS, 2021, 3 (02) : 123 - 130
[34] Knowledge Synergy Learning for Multi-Modal Tracking
He, Yuhang
Ma, Zhiheng
Wei, Xing
Gong, Yihong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5519 - 5532
[35] Multi-Modal Learning for Predicting the Genotype of Glioma
Wei, Yiran
Chen, Xi
Zhu, Lei
Zhang, Lipei
Schonlieb, Carola-Bibiane
Price, Stephen
Li, Chao
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (11) : 3167 - 3178
[36] Multi-Modal Teaching Design in Learning Poetry
Sun Nan
PROCEEDINGS OF 2018 INTERNATIONAL SYMPOSIUM - REFORM AND INNOVATION OF HIGHER ENGINEERING EDUCATION, 2018, : 191 - 194
[37] Multi-Modal Graph Learning for Disease Prediction
Zheng, Shuai
Zhu, Zhenfeng
Liu, Zhizhe
Guo, Zhenyu
Liu, Yang
Yang, Yuchen
Zhao, Yao
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (09) : 2207 - 2216
[38] Learning to Hash on Partial Multi-Modal Data
Wang, Qifan
Si, Luo
Shen, Bin
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3904 - 3910
[39] A unified framework for multi-modal federated learning
Xiong, Baochen
Yang, Xiaoshan
Qi, Fan
Xu, Changsheng
NEUROCOMPUTING, 2022, 480 : 110 - 118
[40] MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA
Xu, Changsheng
Xu, Zhenlong
He, Yifan
Zhou, Shuigeng
Guan, Jihong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1985 - 1989

← 1 2 3 4 5 →