Executing your Commands via Motion Diffusion in Latent Space

被引:80
|
作者
Chen, Xin [1 ]
Jiang, Biao [2 ]
Liu, Wen [1 ]
Huang, Zilong [1 ]
Fu, Bin [1 ]
Chen, Tao [2 ]
Yu, Gang [1 ]
机构
[1] Tencent PCG, Shenyang, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.01726
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.
引用
收藏
页码:18000 / 18010
页数:11
相关论文
共 50 条
  • [41] Priority-Centric Human Motion Generation in Discrete Latent Space
    Kong, Hanyang
    Gong, Kehong
    Lian, Dongze
    Mi, Michael Bi
    Wang, Xinchao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14760 - 14770
  • [42] Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
    Yang, Zizheng
    Yu, Hu
    Li, Bing
    Zhang, Jinghao
    Huang, Jie
    Zhao, Feng
    COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 371 - 389
  • [43] Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry
    Park, Yong-Hyun
    Kwon, Mingi
    Choi, Jaewoong
    Jo, Junghyo
    Uh, Youngjung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Domain Generalization Via Encoding and Resampling in a Unified Latent Space
    Liu, Yajing
    Xiong, Zhiwei
    Li, Ya
    Tian, Xinmei
    Zha, Zheng-Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 126 - 139
  • [45] Assessing Sample Quality via the Latent Space of Generative Models
    Xu, Jingyi
    Le, Hieu
    Samaras, Dimitris
    COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 449 - 464
  • [46] Hybrid Samples Image Dehazing via Latent Space Translation
    Zheng, Yutong
    Sun, Haoying
    Song, Wei
    Computer Engineering and Applications, 2023, 59 (09) : 225 - 236
  • [47] Data Augmentation via Latent Space Interpolation for Image Classification
    Liu, Xiaofeng
    Zou, Yang
    Kong, Lingsheng
    Diao, Zhihui
    Yan, Junliang
    Wang, Jun
    Li, Site
    Jia, Ping
    You, Jane
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 728 - 733
  • [48] Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
    Blattmann, Andreas
    Rombach, Robin
    Ling, Huan
    Dockhorn, Tim
    Kim, Seung Wook
    Fidler, Sanja
    Kreis, Karsten
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22563 - 22575
  • [49] GINT: A Generative Interpretability method via perturbation in the latent space
    Tang, Caizhi
    Cui, Qing
    Li, Longfei
    Zhou, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [50] Addressing topic modelling via reduced latent space clustering
    Schiavon, Lorenzo
    STATISTICAL METHODS AND APPLICATIONS, 2025,