Executing your Commands via Motion Diffusion in Latent Space

被引：80

作者：

Chen, Xin ^{[1
]}

Jiang, Biao ^{[2
]}

Liu, Wen ^{[1
]}

Huang, Zilong ^{[1
]}

Fu, Bin ^{[1
]}

Chen, Tao ^{[2
]}

Yu, Gang ^{[1
]}

机构：

[1] Tencent PCG, Shenyang, Peoples R China

[2] Fudan Univ, Shanghai, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01726

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.

引用

页码：18000 / 18010

页数：11

共 50 条

[21] Nonparametric Motion Retargeting for Humanoid Robots on Shared Latent Space
Choi, Sungjoon
Pan, Matt
Kim, Joohyung
ROBOTICS: SCIENCE AND SYSTEMS XVI, 2020,
[22] Continuous Probabilistic Motion Prediction based on Latent Space Interpolation
Nadarajan, Parthasarathy
Botsch, Michael
Sardina, Sebastian
2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3796 - 3803
[23] Denoising Diffusion Models on Model-Based Latent Space
Scribano, Carmelo
Pezzi, Danilo
Franchini, Giorgia
Prato, Marco
ALGORITHMS, 2023, 16 (11)
[24] Synthesizing realistic sand assemblies with denoising diffusion in latent space
Department of Mechanical and Aerospace Engineering, Rutgers University, Piscataway
NJ, United States
不详
NY, United States
不详
TN, United States
不详
CO, United States
Int. J. Numer. Anal. Methods Geomech., 16 (3933-3956):
[25] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
Haas, René
Huberman-Spiegelglas, Inbar
Mulayoff, Rotem
Graßhof, Stella
Brandt, Sami S.
Michaeli, Tomer
arXiv, 2023,
[26] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
Haas, Rene
Huberman-Spiegelglas, Inbar
Mulayoff, Rotem
Grasshof, Stella
Brandt, Sami S.
Michaeli, Tomer
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
[27] Synthesizing realistic sand assemblies with denoising diffusion in latent space
Vlassis, Nikolaos N.
Sun, WaiChing
Alshibli, Khalid A.
Regueiro, Richard A.
INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, 2024, 48 (16) : 3933 - 3956
[28] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
Haas, Rene
Huberman-Spiegelglas, Inbar
Mulayoff, Rotem
Grasshof, Stella
Brandt, Sami S.
Michaeli, Tomer
2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition, FG 2024, 2024,
[29] Learning Semantic Attributes via a Common Latent Space
Al-Halah, Ziad
Gehrig, Tobias
Stiefelhagen, Rainer
PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 48 - 55
[30] Interpreting the Latent Space of GANs via Measuring Decoupling
Li Z.
Tao R.
Wang J.
Li F.
Niu H.
Yue M.
Li B.
IEEE Transactions on Artificial Intelligence, 2021, 2 (01): : 58 - 70

← 1 2 3 4 5 →