Executing your Commands via Motion Diffusion in Latent Space

被引:80
|
作者
Chen, Xin [1 ]
Jiang, Biao [2 ]
Liu, Wen [1 ]
Huang, Zilong [1 ]
Fu, Bin [1 ]
Chen, Tao [2 ]
Yu, Gang [1 ]
机构
[1] Tencent PCG, Shenyang, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.01726
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.
引用
收藏
页码:18000 / 18010
页数:11
相关论文
共 50 条
  • [21] Nonparametric Motion Retargeting for Humanoid Robots on Shared Latent Space
    Choi, Sungjoon
    Pan, Matt
    Kim, Joohyung
    ROBOTICS: SCIENCE AND SYSTEMS XVI, 2020,
  • [22] Continuous Probabilistic Motion Prediction based on Latent Space Interpolation
    Nadarajan, Parthasarathy
    Botsch, Michael
    Sardina, Sebastian
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3796 - 3803
  • [23] Denoising Diffusion Models on Model-Based Latent Space
    Scribano, Carmelo
    Pezzi, Danilo
    Franchini, Giorgia
    Prato, Marco
    ALGORITHMS, 2023, 16 (11)
  • [24] Synthesizing realistic sand assemblies with denoising diffusion in latent space
    Department of Mechanical and Aerospace Engineering, Rutgers University, Piscataway
    NJ, United States
    不详
    NY, United States
    不详
    TN, United States
    不详
    CO, United States
    Int. J. Numer. Anal. Methods Geomech., 16 (3933-3956):
  • [25] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
    Haas, René
    Huberman-Spiegelglas, Inbar
    Mulayoff, Rotem
    Graßhof, Stella
    Brandt, Sami S.
    Michaeli, Tomer
    arXiv, 2023,
  • [26] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
    Haas, Rene
    Huberman-Spiegelglas, Inbar
    Mulayoff, Rotem
    Grasshof, Stella
    Brandt, Sami S.
    Michaeli, Tomer
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [27] Synthesizing realistic sand assemblies with denoising diffusion in latent space
    Vlassis, Nikolaos N.
    Sun, WaiChing
    Alshibli, Khalid A.
    Regueiro, Richard A.
    INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, 2024, 48 (16) : 3933 - 3956
  • [28] Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
    Haas, Rene
    Huberman-Spiegelglas, Inbar
    Mulayoff, Rotem
    Grasshof, Stella
    Brandt, Sami S.
    Michaeli, Tomer
    2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition, FG 2024, 2024,
  • [29] Learning Semantic Attributes via a Common Latent Space
    Al-Halah, Ziad
    Gehrig, Tobias
    Stiefelhagen, Rainer
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 48 - 55
  • [30] Interpreting the Latent Space of GANs via Measuring Decoupling
    Li Z.
    Tao R.
    Wang J.
    Li F.
    Niu H.
    Yue M.
    Li B.
    IEEE Transactions on Artificial Intelligence, 2021, 2 (01): : 58 - 70