Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

被引:0
|
作者
Simeonov, Anthony [1 ,3 ]
Goyal, Ankit [2 ]
Manuelli, Lucas [2 ]
Yen-Chen, Lin [1 ]
Sarmiento, Alina [1 ,3 ]
Rodriguez, Alberto [1 ]
Agrawal, Pulkit [1 ,3 ]
Fox, Dieter [2 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Improbable AI Lab, Cambridge, MA 02139 USA
来源
关键词
Object Rearrangement; Multi-modality; Manipulation; Point Clouds; METRICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal
引用
收藏
页数:40
相关论文
共 50 条
  • [21] Photonic modes prediction via multi-modal diffusion model
    Sun, Jinyang
    Chen, Xi
    Wang, Xiumei
    Zhu, Dandan
    Zhou, Xingping
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (03):
  • [22] Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
    Luo, Haozheng
    Qin, Ruiyang
    Xu, Chenwei
    Ye, Guo
    Luo, Zening
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 363 - 369
  • [23] Multi-modal person re-identification based on transformer relational regularization
    Zheng, Xiangtian
    Huang, Xiaohua
    Ji, Chen
    Yang, Xiaolin
    Sha, Pengcheng
    Cheng, Liang
    INFORMATION FUSION, 2024, 103
  • [24] Multi-modal Multi-relational Feature Aggregation Network for Medical Knowledge Representation Learning
    Zhang, Yingying
    Fang, Quan
    Qian, Shengsheng
    Xu, Changsheng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3956 - 3965
  • [26] Asymmetry-aware bilinear pooling in multi-modal data for head pose estimation
    Chen, Jiazhong
    Li, Qingqing
    Ren, Dakai
    Cao, Hua
    Ling, Hefei
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 110
  • [27] Multi-modal Force/Vision Sensor Fusion in 6-DOF Pose Tracking
    Alkkiomaki, Olli
    Kyrki, Ville
    Liu, Yong
    Handroos, Heikki
    Kalviainen, Heikki
    ICAR: 2009 14TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, VOLS 1 AND 2, 2009, : 476 - +
  • [28] An Improved Estimation Algorithm of Space Targets Pose Based on Multi-Modal Feature Fusion
    Hua, Jiang
    Hao, Tonglin
    Zeng, Liangcai
    Yu, Gui
    MATHEMATICS, 2021, 9 (17)
  • [29] Complementary Multi-Modal Sensor Fusion for Resilient Robot Pose Estimation in Subterranean Environments
    Khattak, Shehryar
    Huan Nguyen
    Mascarich, Frank
    Tung Dang
    Alexis, Kostas
    2020 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS'20), 2020, : 1024 - 1029
  • [30] Multi-modal Descriptors for Multi-class Hand Pose Recognition in Human Computer Interaction Systems
    Abella, Jordi
    Alcaide, Raul
    Sabate, Anna
    Mas, Joan
    Escalera, Sergio
    Gonzalez, Jordi
    Antens, Coen
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 503 - 508