Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR

被引:1
|
作者
Krome, Niklas [1 ]
Kopp, Stefan [1 ]
机构
[1] Bielefeld Univ, Bielefeld, Germany
关键词
extended reality; social interaction; animation; gesture generation; BEHAVIOR; QUALITY;
D O I
10.1145/3570945.3607315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extended Reality (XR) has a potential to allow social interaction for people that are distant from one another, in educational, clinical or co-working applications, as well as for scientific studies. However, a full-blown embodied social presence and interaction via avatars in XR requires motion tracking hardware that many users do not have. At the same time, modern machine learning approaches enable the synthesis of natural and life-like nonverbal behavior, but only in offline settings and with considerable lag. We evaluate the applicability of current gesture generation systems for online interaction in social XR. We define a set of requirements for real-time-capable gesture generation and propose an approach to employ a state-of-the-art model in a real-time XR interaction pipeline. To test the model under conditions of online interaction, we divide an input audio stream into chunks of different lengths and stitch the resulting gesture animations together to form continuous motion. We evaluate the quality of the resulting multimodal avatar behavior in a user study. Our results show a significant trade-off between real-time generation capabilities and gesture quality. Suggestions for future improvement to retain model performance during online interaction in Social XR are made. A project page with videos of the generated gestures is available at https://nkrome.github.io/CAGE.html.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Towards a Framework for Social Robot Co-speech Gesture Generation with Semantic Expression
    Zhang, Heng
    Yu, Chuang
    Tapus, Adriana
    SOCIAL ROBOTICS, ICSR 2022, PT I, 2022, 13817 : 110 - 119
  • [2] LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
    Zhi, Yihao
    Cun, Xiaodong
    Chen, Xuelin
    Shen, Xi
    Guo, Wen
    Huang, Shaoli
    Gao, Shenghua
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20750 - 20760
  • [3] Continual Learning for Personalized Co-Speech Gesture Generation
    Ahuja, Chaitanya
    Joshi, Pratik
    Ishii, Ryo
    Morency, Louis-Philippe
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20836 - 20846
  • [4] SEEG: Semantic Energized Co-speech Gesture Generation
    Liang, Yuanzhi
    Feng, Qianyu
    Zhu, Linchao
    Hu, Li
    Pan, Pan
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10463 - 10472
  • [5] Real-time Gesture Animation Generation from Speech for Virtual Human Interaction
    Rebol, Manuel
    Guetl, Christian
    Pietroszek, Krzysztof
    EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,
  • [6] Social eye gaze modulates processing of speech and co-speech gesture
    Holler, Judith
    Schubotz, Louise
    Kelly, Spencer
    Hagoort, Peter
    Schuetze, Manuela
    Ozyurek, Asli
    COGNITION, 2014, 133 (03) : 692 - 697
  • [7] EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
    Liu, Haiyang
    Zhu, Zihao
    Becherini, Giorgio
    Peng, Yichen
    Su, Mingyang
    Zhou, You
    Zhe, Xuefei
    Iwamoto, Naoya
    Zheng, Bo
    Black, Michael J.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1144 - 1154
  • [8] Audio-Driven Co-Speech Gesture Video Generation
    Liu, Xian
    Wu, Qianyi
    Zhou, Hang
    Du, Yuanqi
    Wu, Wayne
    Lin, Dahua
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Cross-Modal Quantization for Co-Speech Gesture Generation
    Wang, Zheng
    Zhang, Wei
    Ye, Long
    Zeng, Dan
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10251 - 10263
  • [10] Learning hierarchical discrete prior for co-speech gesture generation
    Zhang, Jian
    Yoshie, Osamu
    NEUROCOMPUTING, 2024, 595