Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

被引:0
|
作者
Simeonov, Anthony [1 ,3 ]
Goyal, Ankit [2 ]
Manuelli, Lucas [2 ]
Yen-Chen, Lin [1 ]
Sarmiento, Alina [1 ,3 ]
Rodriguez, Alberto [1 ]
Agrawal, Pulkit [1 ,3 ]
Fox, Dieter [2 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Improbable AI Lab, Cambridge, MA 02139 USA
来源
关键词
Object Rearrangement; Multi-modality; Manipulation; Point Clouds; METRICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal
引用
收藏
页数:40
相关论文
共 50 条
  • [41] Interpretable Multi-Modal Stacking-Based Ensemble Learning Method for Real Estate Appraisal
    Wang, Sutong
    Zhu, Jiacheng
    Yin, Yunqiang
    Wang, Dujuan
    Cheng, T. C. Edwin
    Wang, Yanzhang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 315 - 328
  • [42] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
    Ruan, Ludan
    Ma, Yiyang
    Yang, Huan
    He, Huiguo
    Liu, Bei
    Fu, Jianlong
    Yuan, Nicholas Jing
    Jin, Qin
    Guo, Baining
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10219 - 10228
  • [43] Cross Diffusion on Multi-hypergraph for Multi-modal 3D Object Recognition
    Zhang, Zizhao
    Lin, Haojie
    Zhu, Junjie
    Zhao, Xibin
    Gao, Yue
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 38 - 49
  • [44] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
    Nair, Nithin Gopalakrishnan
    Bandara, Wele Gedara Chaminda
    Patel, Vishal M.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6070 - 6079
  • [45] MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention
    Wang, Xixi
    Wang, Xiao
    Jiang, Bo
    Tang, Jin
    Luo, Bin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3867 - 3888
  • [46] Hyper-node Relational Graph Attention Network for Multi-modal Knowledge Graph Completion
    Liang, Shuang
    Zhu, Anjie
    Zhang, Jiasheng
    Shao, Jie
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [47] Conditional Variational Inference for Multi-modal Trajectory Prediction with Latent Diffusion Prior
    Yan, Junchi (yanjunchi@sjtu.edu.cn), 1600, Springer Science and Business Media Deutschland GmbH (14325 LNAI):
  • [48] PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects
    Wang, Pengyuan
    Jung, HyunJun
    Li, Yitong
    Shen, Siyuan
    Srikanth, Rahul Parthasarathy
    Garattoni, Lorenzo
    Meier, Sven
    Navab, Nassir
    Busam, Benjamin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21190 - 21199
  • [49] Fruity: A Multi-modal Dataset for Fruit Recognition and 6D-Pose Estimation in Precision Agriculture
    Abdulsalam, Mahmoud
    Chekakta, Zakaria
    Aouf, Nabil
    Hogan, Maxwell
    2023 31ST MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, MED, 2023, : 144 - 149
  • [50] Multi-modal Human pose estimation based on probability distribution perception on a depth convolution neural network
    Wang, Xunjun
    Hu, Xiaochun
    Li, Yun
    Jiang, Caoqing
    PATTERN RECOGNITION LETTERS, 2022, 153 : 36 - 43