DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

被引:4
|
作者
Qi, Qiaosong [1 ]
Zhuo, Le [2 ]
Zhang, Aixi [1 ]
Liao, Yue [2 ]
Fang, Fei [1 ]
Liu, Si [2 ]
Yan, Shuicheng [3 ]
机构
[1] Alibaba Grp, Beijing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
[3] BAAI & Skywork AI, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Diffusion Model; Music-to-Dance; Conditional Generation; Multimodal Learning;
D O I
10.1145/3581783.3612307
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.
引用
收藏
页码:1374 / 1382
页数:9
相关论文
共 50 条
  • [1] Motion to Dance Music Generation using Latent Diffusion Model
    Tan, Vanessa
    Nam, JungHyun
    Nam, Juhan
    Noh, Junyong
    PROCEEDINGS SIGGRAPH ASIA 2023 TECHNICAL COMMUNICATIONS, SA TECHNICAL COMMUNICATIONS 2023, 2023,
  • [2] Bidirectional Autoregressive Diffusion Model for Dance Generation
    Zhang, Canyu
    Tang, Youbao
    Zhang, Ning
    Lin, Ruei-Sung
    Han, Mei
    Xiao, Jing
    Wang, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 687 - 696
  • [3] Sign Motion Generation by Motion Diffusion Model
    Hakozaki, Kohei
    Murakami, Tomoya
    Uchida, Tsubasa
    Miyazaki, Taro
    Kaneko, Hiroyuki
    PROCEEDINGS OF THE SIGGRAPH 2024 POSTERS, 2024,
  • [4] MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model
    Zhang, Mingyuan
    Cai, Zhongang
    Pan, Liang
    Hong, Fangzhou
    Guo, Xinying
    Yang, Lei
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4115 - 4128
  • [5] Dance Dance Generation: Motion Transfer for Internet Videos
    Zhou, Yipin
    Wang, Zhaowen
    Fang, Chen
    Bui, Trung
    Berg, Tamara L.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1208 - 1216
  • [6] A human motion analysis using the rhythm - A estimate method of dance motion with autoregressive model
    Kojima, K
    Otobe, T
    Hironaga, M
    Nagae, S
    IEEE RO-MAN 2000: 9TH IEEE INTERNATIONAL WORKSHOP ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, PROCEEDINGS, 2000, : 190 - 193
  • [7] Realistic Human Motion Generation with Cross-Diffusion Models
    Ren, Zeping
    Huang, Shaoli
    Li, Xiu
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 345 - 362
  • [8] Throwing Motion Generation of a Biped Human Model
    Kim, Joo H.
    Xiang, Yujiang
    Bhatt, Rajankumar
    Yang, Jingzhou
    Arora, Jasbir S.
    Abdel-Malek, Karim
    2008 2ND IEEE RAS & EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL ROBOTICS AND BIOMECHATRONICS (BIOROB 2008), VOLS 1 AND 2, 2008, : 587 - +
  • [9] MoVideo: Motion-Aware Video Generation with Diffusion Model
    Liang, Jingyun
    Fang, Yuchen
    Zhang, Kai
    Timofte, Radu
    Van Gool, Luc
    Ranjan, Rakesh
    COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 56 - 74
  • [10] LaMoD: Latent Motion Diffusion Model for Myocardial Strain Generation
    Xing, Jiarui
    Jayakumar, Nivetha
    Wu, Nian
    Wang, Yu
    Epstein, Frederick H.
    Zhang, Miamniao
    SHAPE IN MEDICAL IMAGING, SHAPEMI 2024, 2025, 15275 : 164 - 177