Audio-Driven Co-Speech Gesture Video Generation

被引：0

作者：

Liu, Xian ^{[1
]}

Wu, Qianyi ^{[2
]}

Zhou, Hang ^{[1
]}

Du, Yuanqi ^{[3
]}

Wu, Wayne ^{[4
]}

Lin, Dahua ^{[1
,4
]}

Liu, Ziwei ^{[5
]}

机构：

[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China

[2] Monash Univ, Clayton, Vic, Australia

[3] Cornell Univ, Ithaca, NY USA

[4] Shanghai AI Lab, Shanghai, Peoples R China

[5] Nanyang Technol Univ, S Lab, Singapore, Singapore

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Co-speech gesture is crucial for human-machine interaction and digital entertainment. While previous works mostly map speech audio to human skeletons (e.g., 2D keypoints), directly generating speakers' gestures in the image domain remains unsolved. In this work, we formally define and study this challenging problem of audio-driven co-speech gesture video generation, i.e., using a unified framework to generate speaker image sequence driven by speech audio. Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics. To this end, we propose a novel framework, Audio-driveN Gesture vIdeo gEneration (ANGIE), to effectively capture the reusable co-speech gesture patterns as well as fine-grained rhythmic movements. To achieve high-fidelity image sequence generation, we leverage an unsupervised motion representation instead of a structural human body prior (e.g., 2D skeletons). Specifically, 1) we propose a vector quantized motion extractor (VQ-Motion Extractor) to summarize common co-speech gesture patterns from implicit motion representation to codebooks. 2) Moreover, a co-speech gesture GPT with motion refinement (Co-Speech GPT) is devised to complement the subtle prosodic motion details. Extensive experiments demonstrate that our framework renders realistic and vivid co-speech gesture video. Demo video and more resources can be found in: https://alvinliu0.github.io/projects/ANGIE

引用

页数：14

共 50 条

[41] Using and Seeing Co-speech Gesture in a Spatial Task
Suppes, Alexandra
Tzeng, Christina Y.
Galguera, Laura
JOURNAL OF NONVERBAL BEHAVIOR, 2015, 39 (03) : 241 - 257
[42] VisemeNet: Audio-Driven Animator-Centric Speech Animation
Zhou, Yang
Xu, Zhan
Landreth, Chris
Kalogerakis, Evangelos
Maji, Subhransu
Singh, Karan
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
[43] Using and Seeing Co-speech Gesture in a Spatial Task
Alexandra Suppes
Christina Y. Tzeng
Laura Galguera
Journal of Nonverbal Behavior, 2015, 39 : 241 - 257
[44] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
Qian, Shenhan
Tu, Zhi
Zhi, Yihao
Liu, Wen
Gao, Shenghua
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11057 - 11066
[45] Verbal working memory and co-speech gesture processing
Momsen, Jacob
Gordon, Jared
Wu, Ying Choon
Coulson, Seana
BRAIN AND COGNITION, 2020, 146
[46] Audio-driven emotional speech animation for interactive virtual characters
Charalambous, Constantinos
Yumak, Zerrin
van der Stappen, A. Frank
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
[47] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
Tong, Haonan
Li, Haopeng
Du, Hongyang
Yang, Zhaohui
Yin, Changchuan
Niyato, Dusit
IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
[48] Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation
Yazdian, Payam Jome
Chen, Mo
Lim, Angelica
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3100 - 3107
[49] Social eye gaze modulates processing of speech and co-speech gesture
Holler, Judith
Schubotz, Louise
Kelly, Spencer
Hagoort, Peter
Schuetze, Manuela
Ozyurek, Asli
COGNITION, 2014, 133 (03) : 692 - 697
[50] Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR
Krome, Niklas
Kopp, Stefan
PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,

← 1 2 3 4 5 →