Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

被引:0
|
作者
He, Xu [1 ]
Huang, Qiaochu [1 ]
Zhang, Zhensong [2 ]
Lin, Zhiwei [1 ]
Wu, Zhiyong [1 ,4 ]
Yang, Sicheng [1 ]
Li, Minglei [3 ]
Chen, Zhiyi [3 ]
Xu, Songcen [2 ]
Wu, Xiaofei [2 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[3] Huawei Cloud Comp Technol Co Ltd, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.00220
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex human movements with crucial appearance information. 2) Gestures and speech exhibit inherent dependencies and should be temporally aligned even of arbitrary length. To solve these problems, we present a novel motion-decoupled framework to generate co-speech gesture videos. Specifically, we first introduce a well-designed nonlinear TPS transformation to obtain latent motion features preserving essential appearance information. Then a transformer-based diffusion model is proposed to learn the temporal correlation between gestures and speech, and performs generation in the latent motion space, followed by an optimal motion selection module to produce long-term coherent and consistent gesture videos. For better visual perception, we further design a refinement network focusing on missing details of certain areas. Extensive experimental results show that our proposed framework significantly outperforms existing approaches in both motion and video-related evaluations. Our code, demos, and more resources are available at https://github.com/thuhcsi/S2G-MDDiffusion.
引用
收藏
页码:2263 / 2273
页数:11
相关论文
共 50 条
  • [41] The visuo-sensorimotor substrate of co-speech gesture processing
    Chui, Kawai
    Ng, Chan-Tat
    Chang, Ting-Ting
    NEUROPSYCHOLOGIA, 2023, 190
  • [42] The effects of learning American Sign Language on co-speech gesture
    Casey, Shannon
    Emmorey, Karen
    Larrabee, Heather
    BILINGUALISM-LANGUAGE AND COGNITION, 2012, 15 (04) : 677 - 686
  • [43] The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation
    Yoon, Youngwoo
    Wolfert, Pieter
    Kucherenko, Taras
    Viegas, Carla
    Nikolov, Teodor
    Tsakov, Mihail
    Henter, Gustav Eje
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 736 - 747
  • [44] Visible Cohesion: A Comparison of Reference Tracking in Sign, Speech, and Co-Speech Gesture
    Perniss, Pamela
    Ozyurek, Asli
    TOPICS IN COGNITIVE SCIENCE, 2015, 7 (01) : 36 - 60
  • [45] Co-speech gesture projection: Evidence from inferential judgments
    Tieu, Lyn
    Pasternak, Robert
    Schlenker, Philippe
    Chemla, Emmanuel
    GLOSSA-A JOURNAL OF GENERAL LINGUISTICS, 2018, 3 (01):
  • [46] Grammatical aspect, gesture, and conceptualization: Using co-speech gesture to reveal event representations
    Parrill, Fey
    Bergen, Benjamin K.
    Lichtenstein, Patricia V.
    COGNITIVE LINGUISTICS, 2013, 24 (01) : 135 - 158
  • [47] Hybrid Seq2Seq Architecture for 3D Co-Speech Gesture Generation
    Saleh, Khaled
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 748 - 752
  • [48] Exploring the Emotional Functions of Co-Speech Hand Gesture in Language and Communication
    Kelly, Spencer D.
    Tran, Quang-Anh Ngo
    TOPICS IN COGNITIVE SCIENCE, 2023,
  • [49] EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
    Qi, Xingqun
    Liu, Chen
    Li, Lincheng
    Hou, Jie
    Xin, Haoran
    Yu, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10420 - 10430
  • [50] The development of co-speech gesture in the communication of children with autism spectrum disorders
    Sowden, Hannah
    Clegg, Judy
    Perkins, Michael
    CLINICAL LINGUISTICS & PHONETICS, 2013, 27 (12) : 922 - 939