DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

被引：0

作者：

Yang, Sicheng ^{[1
]}

Wu, Zhiyong ^{[1
,4
]}

Li, Minglei ^{[2
]}

Zhang, Zhensong ^{[3
]}

Hao, Lei ^{[3
]}

Bao, Weihong ^{[1
]}

Cheng, Ming ^{[1
]}

Xiao, Long ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] Huawei Cloud Comp Technol Co Ltd, Shenzhen, Peoples R China

[3] Huawei Noahs Ark Lab, Shenzhen, Peoples R China

[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model-based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech-matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.

引用

页码：5860 / 5868

页数：9

共 50 条

[1] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Zhu, Lingting
Liu, Xian
Liu, Xuanyu
Qian, Rui
Liu, Ziwei
Yu, Lequan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10544 - 10553
[2] Audio-Driven Co-Speech Gesture Video Generation
Liu, Xian
Wu, Qianyi
Zhou, Hang
Du, Yuanqi
Wu, Wayne
Lin, Dahua
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
Qi, Xingqun
Liu, Chen
Li, Lincheng
Hou, Jie
Xin, Haoran
Yu, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10420 - 10430
[4] Audio-Driven Stylized Gesture Generation with Flow-Based Model
Ye, Sheng
Wen, Yu-Hui
Sun, Yanan
He, Ying
Zhang, Ziyang
Wang, Yaoyuan
He, Weihua
Liu, Yong-Jin
COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728
[5] MDG:Multilingual Co-speech Gesture Generation with Low-level Audio Representation and Diffusion Models
Yang, Jie
Bao, Feilong
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 210 - 215
[6] DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Hogue, Steven
Zhang, Chenxu
Daruger, Hamza
Tian, Yapeng
Guo, Xiaohu
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 1922 - 1931
[7] A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Nyatsanga, S.
Kucherenko, T.
Ahuja, C.
Henter, G. E.
Neff, M.
COMPUTER GRAPHICS FORUM, 2023, 42 (02) : 569 - 596
[8] Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model
Yao, Hongze
Xu, Yingting
Wu, Weitao
He, Huabin
Ren, Wen
Cai, Zhiming
ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (09): : 5392 - 5408
[9] EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Liu, Haiyang
Zhu, Zihao
Becherini, Giorgio
Peng, Yichen
Su, Mingyang
Zhou, You
Zhe, Xuefei
Iwamoto, Naoya
Zheng, Bo
Black, Michael J.
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1144 - 1154
[10] Continual Learning for Personalized Co-Speech Gesture Generation
Ahuja, Chaitanya
Joshi, Pratik
Ishii, Ryo
Morency, Louis-Philippe
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20836 - 20846

← 1 2 3 4 5 →