ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

被引:2
|
作者
Li, Xiaoqi [1 ]
Zhang, Mingxu [2 ]
Geng, Yiran [1 ]
Geng, Haoran [1 ]
Long, Yuxing [1 ]
Shen, Yan [1 ]
Zhang, Renrui [3 ]
Liu, Jiaming [1 ]
Dong, Hao [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[3] CUHK, MMLab, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.01710
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robot manipulation relies on accurately predicting contact points and end-effector directions to ensure successful operation. However, learning-based robot manipulation, trained on a limited category within a simulator, often struggles to achieve generalizability, especially when confronted with extensive categories. Therefore, we introduce an innovative approach for robot manipulation that leverages the robust reasoning capabilities of Multimodal Large Language Models (MLLMs) to enhance the stability and generalization of manipulation. By fine-tuning the injected adapters, we preserve the inherent common sense and reasoning ability of the MLLMs while equipping them with the ability for manipulation. The fundamental insight lies in the introduced fine-tuning paradigm, encompassing object category understanding, affordance prior reasoning, and object-centric pose prediction to stimulate the reasoning ability of MLLM in manipulation. During inference, our approach utilizes an RGB image and text prompt to predict the end effector's pose in chain of thoughts. After the initial contact is established, an active impedance adaptation policy is introduced to plan the upcoming way-points in a closed-loop manner. Moreover, in real world, we design a test-time adaptation (TTA) strategy for manipulation to enable the model better adapt to the current real-world scene configuration. Experiments in simulator and real-world show the promising performance of ManipLLM. More details and demonstrations can be found at https://sites.google.com/view/manipllm.
引用
收藏
页码:18061 / 18070
页数:10
相关论文
共 50 条
  • [1] Scaling Object-centric Robotic Manipulation with Multimodal Object Identification
    Mitash, Chaitanya
    Hussein, Mostafa
    Vanbaar, Jeroen
    Terhuja, Vikedo
    Katyal, Kapil
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 1913 - 1920
  • [2] ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
    Mitash, Chaitanya
    Wang, Fan
    Lu, Shiyang
    Terhuja, Vikedo
    Garaas, Tyler
    Polido, Felipe
    Nambi, Manikantan
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9132 - 9139
  • [3] Multimodal embodied attribute learning by robots for object-centric action policies
    Xiaohan Zhang
    Saeid Amiri
    Jivko Sinapov
    Jesse Thomason
    Peter Stone
    Shiqi Zhang
    Autonomous Robots, 2023, 47 : 505 - 528
  • [4] Multimodal embodied attribute learning by robots for object-centric action policies
    Zhang, Xiaohan
    Amiri, Saeid
    Sinapov, Jivko
    Thomason, Jesse
    Stone, Peter
    Zhang, Shiqi
    AUTONOMOUS ROBOTS, 2023, 47 (05) : 505 - 528
  • [5] Object-Centric Approach to Prediction and Labeling of Manipulation Tasks
    Chen, Ee Heng
    Burschka, Darius
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6931 - 6938
  • [6] Learning and Sequencing of Object-Centric Manipulation Skills for Industrial Tasks
    Rozo, Leonel
    Guo, Meng
    Kupcsik, Andras G.
    Todescato, Marco
    Schillinger, Philipp
    Giftthaler, Markus
    Ochs, Matthias
    Spies, Markus
    Waniek, Nicolai
    Kesper, Patrick
    Buerger, Mathias
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9072 - 9079
  • [7] Object-Centric Programming: A New Modeling Paradigm for Robotic Applications
    Angerer, Andreas
    Hoffmann, Alwin
    Ortmeier, Frank
    Vistein, Michael
    Reif, Wolfgang
    2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 18 - 23
  • [8] Language-Mediated, Object-Centric Representation Learning
    Wang, Ruocheng
    Mao, Jiayuan
    Gershman, Samuel J.
    Wu, Jiajun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2033 - 2046
  • [9] Dynamics Learning with Object-Centric Interaction Networks for Robot Manipulation
    Wang, Jiayu
    Hu, Chuxiong
    Wang, Yunan
    Zhu, Yu
    IEEE Access, 2021, 9 : 68277 - 68288
  • [10] Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation
    Wang, Jiayu
    Hu, Chuxiong
    Wang, Yunan
    Zhu, Yu
    IEEE ACCESS, 2021, 9 : 68277 - 68288