Executing your Commands via Motion Diffusion in Latent Space

被引:80
|
作者
Chen, Xin [1 ]
Jiang, Biao [2 ]
Liu, Wen [1 ]
Huang, Zilong [1 ]
Fu, Bin [1 ]
Chen, Tao [2 ]
Yu, Gang [1 ]
机构
[1] Tencent PCG, Shenyang, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.01726
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.
引用
收藏
页码:18000 / 18010
页数:11
相关论文
共 50 条
  • [1] Length-Aware Motion Synthesis via Latent Diffusion
    Sampieri, Alessio
    Palma, Alessio
    Spinelli, Indro
    Galasso, Fabio
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 107 - 124
  • [2] Human Motion Prediction via Pattern Completion in Latent Representation Space
    Xu, Yi Tian
    Li, Yaqiao
    Meger, David
    2019 16TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2019), 2019, : 57 - 64
  • [3] ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space
    Yan, Yashuai
    Mascaro, Esteve Valls
    Lee, Dongheui
    2023 IEEE-RAS 22ND INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, HUMANOIDS, 2023,
  • [4] SOCIAL SPACE DIFFUSION: APPLICATIONS OF A LATENT SPACE MODEL TO DIFFUSION WITH UNCERTAIN TIES
    Fisher, Jacob C.
    SOCIOLOGICAL METHODOLOGY, VOL 49, 2019, 49 : 258 - 294
  • [5] LATENT SPACE MOTION ANALYSIS FOR COLLABORATIVE INTELLIGENCE
    Ulhaq, Mateen
    Bajic, Ivan, V
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8498 - 8502
  • [6] Generative myocardial motion tracking via latent space exploration with biomechanics-informed prior
    Qin, Chen
    Wang, Shuo
    Chen, Chen
    Bai, Wenjia
    Rueckert, Daniel
    MEDICAL IMAGE ANALYSIS, 2023, 83
  • [7] Double Diffusion Maps and their Latent Harmonics for scientific computations in latent space
    Evangelou, Nikolaos
    Dietrich, Felix
    Chiavazzo, Eliodoro
    Lehmberg, Daniel
    Meila, Marina
    Kevrekidis, Ioannis G.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 485
  • [8] BASS ACCOMPANIMENT GENERATION VIA LATENT DIFFUSION
    Pasini, Marco
    Grachten, Maarten
    Lattner, Stefan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1166 - 1170
  • [9] Guided latent space regression for human motion generation
    Avizzano, Carlo Alberto
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (04) : 340 - 350
  • [10] A Structured Latent Space for Human Body Motion Generation
    Marsot, Mathieu
    Wuhrer, Stefanie
    Franco, Jean-Sebastien
    Durocher, Stephane
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 557 - 566