DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

被引:0
|
作者
Yang, Sicheng [1 ]
Wu, Zhiyong [1 ,4 ]
Li, Minglei [2 ]
Zhang, Zhensong [3 ]
Hao, Lei [3 ]
Bao, Weihong [1 ]
Cheng, Ming [1 ]
Xiao, Long [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Huawei Cloud Comp Technol Co Ltd, Shenzhen, Peoples R China
[3] Huawei Noahs Ark Lab, Shenzhen, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model-based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech-matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.
引用
收藏
页码:5860 / 5868
页数:9
相关论文
共 50 条
  • [31] Giving Interaction a Hand - Deep Models of Co-speech Gesture in Multimodal Systems
    Kopp, Stefan
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 245 - 246
  • [32] Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Xu, Yinghao
    Qian, Rui
    Lin, Xinyi
    Zhou, Xiaowei
    Wu, Wayne
    Dai, Bo
    Zhou, Bolei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10452 - 10462
  • [33] Towards a Framework for Social Robot Co-speech Gesture Generation with Semantic Expression
    Zhang, Heng
    Yu, Chuang
    Tapus, Adriana
    SOCIAL ROBOTICS, ICSR 2022, PT I, 2022, 13817 : 110 - 119
  • [34] Speakers adapt gestures to addressees' knowledge: implications for models of co-speech gesture
    Galati, Alexia
    Brennan, Susan E.
    LANGUAGE COGNITION AND NEUROSCIENCE, 2014, 29 (04) : 435 - 451
  • [35] Co-Speech Gesture Synthesis using Discrete Gesture Token Learning
    Lu, Shuhong
    Yoon, Youngwoo
    Feng, Andrew
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 9808 - 9815
  • [36] Using and Seeing Co-speech Gesture in a Spatial Task
    Suppes, Alexandra
    Tzeng, Christina Y.
    Galguera, Laura
    JOURNAL OF NONVERBAL BEHAVIOR, 2015, 39 (03) : 241 - 257
  • [37] TAG2G: A Diffusion-Based Approach to Interlocutor-Aware Co-Speech Gesture Generation
    Favali, Filippo
    Schmuck, Viktor
    Villani, Valeria
    Celiktutan, Oya
    ELECTRONICS, 2024, 13 (17)
  • [38] VisemeNet: Audio-Driven Animator-Centric Speech Animation
    Zhou, Yang
    Xu, Zhan
    Landreth, Chris
    Kalogerakis, Evangelos
    Maji, Subhransu
    Singh, Karan
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [39] Using and Seeing Co-speech Gesture in a Spatial Task
    Alexandra Suppes
    Christina Y. Tzeng
    Laura Galguera
    Journal of Nonverbal Behavior, 2015, 39 : 241 - 257
  • [40] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
    Qian, Shenhan
    Tu, Zhi
    Zhi, Yihao
    Liu, Wen
    Gao, Shenghua
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11057 - 11066