DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

被引:0
|
作者
Yang, Sicheng [1 ]
Wu, Zhiyong [1 ,4 ]
Li, Minglei [2 ]
Zhang, Zhensong [3 ]
Hao, Lei [3 ]
Bao, Weihong [1 ]
Cheng, Ming [1 ]
Xiao, Long [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Huawei Cloud Comp Technol Co Ltd, Shenzhen, Peoples R China
[3] Huawei Noahs Ark Lab, Shenzhen, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model-based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech-matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.
引用
收藏
页码:5860 / 5868
页数:9
相关论文
共 50 条
  • [1] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
    Zhu, Lingting
    Liu, Xian
    Liu, Xuanyu
    Qian, Rui
    Liu, Ziwei
    Yu, Lequan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10544 - 10553
  • [2] Audio-Driven Co-Speech Gesture Video Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Du, Yuanqi
    Wu, Wayne
    Lin, Dahua
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
    Qi, Xingqun
    Liu, Chen
    Li, Lincheng
    Hou, Jie
    Xin, Haoran
    Yu, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10420 - 10430
  • [4] Audio-Driven Stylized Gesture Generation with Flow-Based Model
    Ye, Sheng
    Wen, Yu-Hui
    Sun, Yanan
    He, Ying
    Zhang, Ziyang
    Wang, Yaoyuan
    He, Weihua
    Liu, Yong-Jin
    COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728
  • [5] MDG:Multilingual Co-speech Gesture Generation with Low-level Audio Representation and Diffusion Models
    Yang, Jie
    Bao, Feilong
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 210 - 215
  • [6] DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
    Hogue, Steven
    Zhang, Chenxu
    Daruger, Hamza
    Tian, Yapeng
    Guo, Xiaohu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 1922 - 1931
  • [7] A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
    Nyatsanga, S.
    Kucherenko, T.
    Ahuja, C.
    Henter, G. E.
    Neff, M.
    COMPUTER GRAPHICS FORUM, 2023, 42 (02) : 569 - 596
  • [8] Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model
    Yao, Hongze
    Xu, Yingting
    Wu, Weitao
    He, Huabin
    Ren, Wen
    Cai, Zhiming
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (09): : 5392 - 5408
  • [9] EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
    Liu, Haiyang
    Zhu, Zihao
    Becherini, Giorgio
    Peng, Yichen
    Su, Mingyang
    Zhou, You
    Zhe, Xuefei
    Iwamoto, Naoya
    Zheng, Bo
    Black, Michael J.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1144 - 1154
  • [10] Continual Learning for Personalized Co-Speech Gesture Generation
    Ahuja, Chaitanya
    Joshi, Pratik
    Ishii, Ryo
    Morency, Louis-Philippe
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20836 - 20846