Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

被引:0
|
作者
Simeonov, Anthony [1 ,3 ]
Goyal, Ankit [2 ]
Manuelli, Lucas [2 ]
Yen-Chen, Lin [1 ]
Sarmiento, Alina [1 ,3 ]
Rodriguez, Alberto [1 ]
Agrawal, Pulkit [1 ,3 ]
Fox, Dieter [2 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Improbable AI Lab, Cambridge, MA 02139 USA
来源
关键词
Object Rearrangement; Multi-modality; Manipulation; Point Clouds; METRICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal
引用
收藏
页数:40
相关论文
共 50 条
  • [1] Multi-Modal Latent Diffusion
    Bounoua, Mustapha
    Franzese, Giulio
    Michiardi, Pietro
    ENTROPY, 2024, 26 (04)
  • [2] Unified losses for multi-modal pose coding and regression
    Johnson, Leif
    Cooper, Joseph
    Ballard, Dana
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [3] Multi-Modal Stacking Ensemble for the Diagnosis of Cardiovascular Diseases
    Yoon, Taeyoung
    Kang, Daesung
    JOURNAL OF PERSONALIZED MEDICINE, 2023, 13 (02):
  • [4] Building Multi-Modal Relational Graphs for Multimedia Retrieval
    Shieh, Jyh-Ren
    Lin, Ching-Yung
    Wang, Shun-Xuan
    Wu, Ja-Ling
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2011, 2 (02): : 19 - 41
  • [5] Sparse Relational Topical Coding on multi-modal data
    Song, Lingyun
    Liu, Jun
    Luo, Minnan
    Qian, Buyue
    Yang, Kuan
    PATTERN RECOGNITION, 2017, 72 : 368 - 380
  • [6] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
    Zeng, Yawen
    Cao, Da
    Wei, Xiaochi
    Liu, Meng
    Zhao, Zhou
    Qin, Zheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
  • [7] MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion
    Shang, Yuying
    Fu, Kun
    Zhang, Zequn
    Jin, Li
    Liu, Zinan
    Wang, Shensi
    Li, Shuchao
    Sensors, 2024, 24 (23)
  • [8] Deep Fusion for Multi-Modal 6D Pose Estimation
    Lin, Shifeng
    Wang, Zunran
    Zhang, Shenghao
    Ling, Yonggen
    Yang, Chenguang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2023, : 1 - 10
  • [9] Multi-Modal Pose Representations for 6-DOF Object Tracking
    Mateusz Majcher
    Bogdan Kwolek
    Journal of Intelligent & Robotic Systems, 110 (4)
  • [10] Multi-Modal Sensor Fusion for Indoor Mobile Robot Pose Estimation
    Dobrev, Yassen
    Flores, Sergio
    Vossiek, Martin
    PROCEEDINGS OF THE 2016 IEEE/ION POSITION, LOCATION AND NAVIGATION SYMPOSIUM (PLANS), 2016, : 553 - 556