DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

被引:0
|
作者
Yang, Sicheng [1 ]
Wu, Zhiyong [1 ,4 ]
Li, Minglei [2 ]
Zhang, Zhensong [3 ]
Hao, Lei [3 ]
Bao, Weihong [1 ]
Cheng, Ming [1 ]
Xiao, Long [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Huawei Cloud Comp Technol Co Ltd, Shenzhen, Peoples R China
[3] Huawei Noahs Ark Lab, Shenzhen, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model-based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech-matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.
引用
收藏
页码:5860 / 5868
页数:9
相关论文
共 50 条
  • [21] LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
    Zhi, Yihao
    Cun, Xiaodong
    Chen, Xuelin
    Shen, Xi
    Guo, Wen
    Huang, Shaoli
    Gao, Shenghua
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20750 - 20760
  • [22] The UAN Colombian co-speech gesture corpus
    David A. Herrera
    Sonia Rodríguez
    Douglas Niño
    Mercedes Pardo-Martínez
    Verónica Giraldo
    Language Resources and Evaluation, 2021, 55 : 833 - 854
  • [23] The UAN Colombian co-speech gesture corpus
    Herrera, David A.
    Rodriguez, Sonia
    Nino, Douglas
    Pardo-Martinez, Mercedes
    Giraldo, Veronica
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (03) : 833 - 854
  • [24] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    Zhou, Yang
    Yang, Jimei
    Li, Dingzeyu
    Saito, Jun
    Aneja, Deepali
    Kalogerakis, Evangelos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
  • [25] Audio-Driven Talking Face Generation: A Review
    Liu, Shiguang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
  • [26] Personality Expression Using Co-Speech Gesture
    Sonlu, Sinan
    Demir, Halil Özgür
    Güdükbay, Uǧur
    ACM Transactions on Applied Perception, 2024, 22 (02)
  • [27] Tracking Discourse Topics in Co-speech Gesture
    Laparle, Schuyler
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT. HUMAN BODY, MOTION AND BEHAVIOR, DHM 2021, PT I, 2021, 12777 : 233 - 249
  • [28] Co-speech gesture as input in verb learning
    Goodrich, Whitney
    Kam, Carla L. Hudson
    DEVELOPMENTAL SCIENCE, 2009, 12 (01) : 81 - 87
  • [29] LEARNING TORSO PRIOR FOR CO-SPEECH GESTURE GENERATION WITH BETTER HAND SHAPE
    Wang, Hexiang
    Liu, Fengqi
    Yi, Ran
    Ma, Lizhuang
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1 - 5
  • [30] Co-speech Gesture Video Generation with 3D Human Meshes
    Mahapatra, Aniruddha
    Mishra, Richa
    Li, Renda
    Chen, Ziyi
    Ding, Boyang
    Wang, Shoulei
    Zhu, Jun-Yan
    Chang, Peng
    Han, Mei
    Xiao, Jing
    COMPUTER VISION - ECCV 2024, PT LXXXIX, 2025, 15147 : 172 - 189