Audio-Driven Co-Speech Gesture Video Generation

被引:0
|
作者
Liu, Xian [1 ]
Wu, Qianyi [2 ]
Zhou, Hang [1 ]
Du, Yuanqi [3 ]
Wu, Wayne [4 ]
Lin, Dahua [1 ,4 ]
Liu, Ziwei [5 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Monash Univ, Clayton, Vic, Australia
[3] Cornell Univ, Ithaca, NY USA
[4] Shanghai AI Lab, Shanghai, Peoples R China
[5] Nanyang Technol Univ, S Lab, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-speech gesture is crucial for human-machine interaction and digital entertainment. While previous works mostly map speech audio to human skeletons (e.g., 2D keypoints), directly generating speakers' gestures in the image domain remains unsolved. In this work, we formally define and study this challenging problem of audio-driven co-speech gesture video generation, i.e., using a unified framework to generate speaker image sequence driven by speech audio. Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics. To this end, we propose a novel framework, Audio-driveN Gesture vIdeo gEneration (ANGIE), to effectively capture the reusable co-speech gesture patterns as well as fine-grained rhythmic movements. To achieve high-fidelity image sequence generation, we leverage an unsupervised motion representation instead of a structural human body prior (e.g., 2D skeletons). Specifically, 1) we propose a vector quantized motion extractor (VQ-Motion Extractor) to summarize common co-speech gesture patterns from implicit motion representation to codebooks. 2) Moreover, a co-speech gesture GPT with motion refinement (Co-Speech GPT) is devised to complement the subtle prosodic motion details. Extensive experiments demonstrate that our framework renders realistic and vivid co-speech gesture video. Demo video and more resources can be found in: https://alvinliu0.github.io/projects/ANGIE
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Using and Seeing Co-speech Gesture in a Spatial Task
    Suppes, Alexandra
    Tzeng, Christina Y.
    Galguera, Laura
    JOURNAL OF NONVERBAL BEHAVIOR, 2015, 39 (03) : 241 - 257
  • [42] VisemeNet: Audio-Driven Animator-Centric Speech Animation
    Zhou, Yang
    Xu, Zhan
    Landreth, Chris
    Kalogerakis, Evangelos
    Maji, Subhransu
    Singh, Karan
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [43] Using and Seeing Co-speech Gesture in a Spatial Task
    Alexandra Suppes
    Christina Y. Tzeng
    Laura Galguera
    Journal of Nonverbal Behavior, 2015, 39 : 241 - 257
  • [44] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
    Qian, Shenhan
    Tu, Zhi
    Zhi, Yihao
    Liu, Wen
    Gao, Shenghua
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11057 - 11066
  • [45] Verbal working memory and co-speech gesture processing
    Momsen, Jacob
    Gordon, Jared
    Wu, Ying Choon
    Coulson, Seana
    BRAIN AND COGNITION, 2020, 146
  • [46] Audio-driven emotional speech animation for interactive virtual characters
    Charalambous, Constantinos
    Yumak, Zerrin
    van der Stappen, A. Frank
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [47] Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing
    Tong, Haonan
    Li, Haopeng
    Du, Hongyang
    Yang, Zhaohui
    Yin, Changchuan
    Niyato, Dusit
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (01) : 93 - 97
  • [48] Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation
    Yazdian, Payam Jome
    Chen, Mo
    Lim, Angelica
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3100 - 3107
  • [49] Social eye gaze modulates processing of speech and co-speech gesture
    Holler, Judith
    Schubotz, Louise
    Kelly, Spencer
    Hagoort, Peter
    Schuetze, Manuela
    Ozyurek, Asli
    COGNITION, 2014, 133 (03) : 692 - 697
  • [50] Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR
    Krome, Niklas
    Kopp, Stefan
    PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,