DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

被引：0

作者：

Yang, Sicheng ^{[1
]}

Wu, Zhiyong ^{[1
,4
]}

Li, Minglei ^{[2
]}

Zhang, Zhensong ^{[3
]}

Hao, Lei ^{[3
]}

Bao, Weihong ^{[1
]}

Cheng, Ming ^{[1
]}

Xiao, Long ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] Huawei Cloud Comp Technol Co Ltd, Shenzhen, Peoples R China

[3] Huawei Noahs Ark Lab, Shenzhen, Peoples R China

[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model-based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech-matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.

引用

页码：5860 / 5868

页数：9

共 50 条

[41] Audio-driven Talking Face Video Generation with Emotion
Liang, Jiadong
Lu, Feng
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
[42] Verbal working memory and co-speech gesture processing
Momsen, Jacob
Gordon, Jared
Wu, Ying Choon
Coulson, Seana
BRAIN AND COGNITION, 2020, 146
[43] Audio-driven emotional speech animation for interactive virtual characters
Charalambous, Constantinos
Yumak, Zerrin
van der Stappen, A. Frank
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
[44] Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation
Yazdian, Payam Jome
Chen, Mo
Lim, Angelica
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3100 - 3107
[45] Social eye gaze modulates processing of speech and co-speech gesture
Holler, Judith
Schubotz, Louise
Kelly, Spencer
Hagoort, Peter
Schuetze, Manuela
Ozyurek, Asli
COGNITION, 2014, 133 (03) : 692 - 697
[46] DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation
Zhang, Fan
Wang, Zhaohan
Lyu, Xin
Ji, Naye
Zhao, Siyuan
Gao, Fuxing
ELECTRONICS, 2024, 13 (09)
[47] Gesturing the source domain The role of co-speech gesture in the metaphorical models of gender transition
Lederer, Jenny
METAPHOR AND THE SOCIAL WORLD, 2019, 9 (01) : 32 - 58
[48] Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR
Krome, Niklas
Kopp, Stefan
PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,
[49] Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis
Voss, Hendric
Kopp, Stefan
PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,
[50] Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
Lee, Daeun
Son, Sejung
Jeon, Hyolim
Kim, Seungbae
Han, Jinyoung
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9287 - 9303

← 1 2 3 4 5 →