Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

被引：0

作者：

Simeonov, Anthony ^{[1
,3
]}

Goyal, Ankit ^{[2
]}

Manuelli, Lucas ^{[2
]}

Yen-Chen, Lin ^{[1
]}

Sarmiento, Alina ^{[1
,3
]}

Rodriguez, Alberto ^{[1
]}

Agrawal, Pulkit ^{[1
,3
]}

Fox, Dieter ^{[2
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] NVIDIA, Santa Clara, CA USA

[3] Improbable AI Lab, Cambridge, MA 02139 USA

来源：

CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷

关键词：

Object Rearrangement; Multi-modality; Manipulation; Point Clouds; METRICS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal

引用

页数：40

共 50 条

[41] Interpretable Multi-Modal Stacking-Based Ensemble Learning Method for Real Estate Appraisal
Wang, Sutong
Zhu, Jiacheng
Yin, Yunqiang
Wang, Dujuan
Cheng, T. C. Edwin
Wang, Yanzhang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 315 - 328
[42] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Ruan, Ludan
Ma, Yiyang
Yang, Huan
He, Huiguo
Liu, Bei
Fu, Jianlong
Yuan, Nicholas Jing
Jin, Qin
Guo, Baining
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10219 - 10228
[43] Cross Diffusion on Multi-hypergraph for Multi-modal 3D Object Recognition
Zhang, Zizhao
Lin, Haojie
Zhu, Junjie
Zhao, Xibin
Gao, Yue
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 38 - 49
[44] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Nair, Nithin Gopalakrishnan
Bandara, Wele Gedara Chaminda
Patel, Vishal M.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6070 - 6079
[45] MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention
Wang, Xixi
Wang, Xiao
Jiang, Bo
Tang, Jin
Luo, Bin
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3867 - 3888
[46] Hyper-node Relational Graph Attention Network for Multi-modal Knowledge Graph Completion
Liang, Shuang
Zhu, Anjie
Zhang, Jiasheng
Shao, Jie
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[47] Conditional Variational Inference for Multi-modal Trajectory Prediction with Latent Diffusion Prior
Yan, Junchi (yanjunchi@sjtu.edu.cn), 1600, Springer Science and Business Media Deutschland GmbH (14325 LNAI):
[48] PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects
Wang, Pengyuan
Jung, HyunJun
Li, Yitong
Shen, Siyuan
Srikanth, Rahul Parthasarathy
Garattoni, Lorenzo
Meier, Sven
Navab, Nassir
Busam, Benjamin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21190 - 21199
[49] Fruity: A Multi-modal Dataset for Fruit Recognition and 6D-Pose Estimation in Precision Agriculture
Abdulsalam, Mahmoud
Chekakta, Zakaria
Aouf, Nabil
Hogan, Maxwell
2023 31ST MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, MED, 2023, : 144 - 149
[50] Multi-modal Human pose estimation based on probability distribution perception on a depth convolution neural network
Wang, Xunjun
Hu, Xiaochun
Li, Yun
Jiang, Caoqing
PATTERN RECOGNITION LETTERS, 2022, 153 : 36 - 43

← 1 2 3 4 5 →