Audio-Driven Stylized Gesture Generation with Flow-Based Model

被引:9
|
作者
Ye, Sheng [1 ]
Wen, Yu-Hui [1 ]
Sun, Yanan [1 ]
He, Ying [2 ]
Zhang, Ziyang [3 ]
Wang, Yaoyuan [3 ]
He, Weihua [4 ]
Liu, Yong-Jin [1 ]
机构
[1] Tsinghua Univ, CS Dept BNRist, Beijing, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Huawei Technol Co Ltd, Adv Comp & Storage Lab, Shenzhen, Peoples R China
[4] Tsinghua Univ, Dept Precis Instrument, Beijing, Peoples R China
来源
基金
中国博士后科学基金;
关键词
Stylized gesture; Flow-based model; Global encoder; SPEECH;
D O I
10.1007/978-3-031-20065-6_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating stylized audio-driven gestures for robots and virtual avatars has attracted increasing considerations recently. Existing methods require style labels (e.g. speaker identities), or complex preprocessing of data to obtain the style control parameters. In this paper, we propose a new end-to-end flow-based model, which can generate audio-driven gestures of arbitrary styles with neither preprocessing nor style labels. To achieve this goal, we introduce a global encoder and a gesture perceptual loss into the classic generative flow model to capture both global and local information. We conduct extensive experiments on two benchmark datasets: the TED Dataset and the Trinity Dataset. Both quantitative and qualitative evaluations show that the proposed model outperforms state-of-the-art models.
引用
收藏
页码:712 / 728
页数:17
相关论文
共 50 条
  • [1] DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
    Yang, Sicheng
    Wu, Zhiyong
    Li, Minglei
    Zhang, Zhensong
    Hao, Lei
    Bao, Weihong
    Cheng, Ming
    Xiao, Long
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5860 - 5868
  • [2] Audio-Driven Co-Speech Gesture Video Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Du, Yuanqi
    Wu, Wayne
    Lin, Dahua
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
    Zhu, Lingting
    Liu, Xian
    Liu, Xuanyu
    Qian, Rui
    Liu, Ziwei
    Yu, Lequan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10544 - 10553
  • [4] Audio-driven Neural Gesture Reenactment with Video Motion Graphs
    Zhou, Yang
    Yang, Jimei
    Li, Dingzeyu
    Saito, Jun
    Aneja, Deepali
    Kalogerakis, Evangelos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3408 - 3418
  • [5] Audio-Driven Talking Face Generation: A Review
    Liu, Shiguang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
  • [6] EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
    Qi, Xingqun
    Liu, Chen
    Li, Lincheng
    Hou, Jie
    Xin, Haoran
    Yu, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10420 - 10430
  • [7] Audio-driven Talking Face Video Generation with Emotion
    Liang, Jiadong
    Lu, Feng
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
  • [8] Audio-Driven Deformation Flow for Effective Lip Reading
    Feng, Dalu
    Yang, Shuang
    Shan, Shiguang
    Chen, Xilin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 274 - 280
  • [9] Audio-driven Talking Head Generation with Transformer and 3D Morphable Model
    Huang, Ricong
    Zhong, Weizhi
    Li, Guanbin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7035 - 7039
  • [10] Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
    Yoon, Hyun-Wook
    Lee, Sang-Hoon
    Noh, Hyeong-Rae
    Lee, Seong-Whan
    INTERSPEECH 2020, 2020, : 3545 - 3549