Learning the Effects of Physical Actions in a Multi-modal Environment

被引:0
|
作者
Dagan, Gautier [1 ]
Keller, Frank [1 ]
Lascarides, Alex [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action's outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model's performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.
引用
下载
收藏
页码:133 / 148
页数:16
相关论文
共 50 条
  • [21] Environment-dependent depth enhancement with multi-modal sensor fusion learning
    Takami, Kuya
    Lee, Taeyoung
    2018 SECOND IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC), 2018, : 232 - 237
  • [22] Multi-modal person identification in a smart environment
    Ekenel, Hazim Kemal
    Fischer, Mika
    Jin, Qin
    Stiefelhagen, Rainer
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 2984 - +
  • [23] Passive multi-modal sensors for the urban environment
    Ladas, A
    Frankel, R
    Unattended Ground Sensor Technologies and Applications VII, 2005, 5796 : 477 - 486
  • [24] Multi-Modal Learning and Relaxation of Physical Conflict for an Exoskeleton Robot with Proprioceptive Perception
    Zhang, Xuan
    Shu, Yana
    Chen, Yu
    Chen, Gong
    Ye, Jing
    Li, Xiu
    Li, Xiang
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10490 - 10496
  • [25] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
  • [26] On Multi-modal Fusion Learning in constraint propagation
    Li, Yaoyi
    Lu, Hongtao
    INFORMATION SCIENCES, 2018, 462 : 204 - 217
  • [27] Mineral: Multi-modal Network Representation Learning
    Kefato, Zekarias T.
    Sheikh, Nasrullah
    Montresor, Alberto
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 286 - 298
  • [28] Multi-Modal Curriculum Learning over Graphs
    Gong, Chen
    Yang, Jian
    Tao, Dacheng
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (04)
  • [29] On Multi-Modal Learning of Editing Source Code
    Chakraborty, Saikat
    Ray, Baishakhi
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 443 - 455
  • [30] Generalization analysis of multi-modal metric learning
    Lei, Yunwen
    Ying, Yiming
    ANALYSIS AND APPLICATIONS, 2016, 14 (04) : 503 - 521