Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR

被引:1
|
作者
Krome, Niklas [1 ]
Kopp, Stefan [1 ]
机构
[1] Bielefeld Univ, Bielefeld, Germany
关键词
extended reality; social interaction; animation; gesture generation; BEHAVIOR; QUALITY;
D O I
10.1145/3570945.3607315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extended Reality (XR) has a potential to allow social interaction for people that are distant from one another, in educational, clinical or co-working applications, as well as for scientific studies. However, a full-blown embodied social presence and interaction via avatars in XR requires motion tracking hardware that many users do not have. At the same time, modern machine learning approaches enable the synthesis of natural and life-like nonverbal behavior, but only in offline settings and with considerable lag. We evaluate the applicability of current gesture generation systems for online interaction in social XR. We define a set of requirements for real-time-capable gesture generation and propose an approach to employ a state-of-the-art model in a real-time XR interaction pipeline. To test the model under conditions of online interaction, we divide an input audio stream into chunks of different lengths and stitch the resulting gesture animations together to form continuous motion. We evaluate the quality of the resulting multimodal avatar behavior in a user study. Our results show a significant trade-off between real-time generation capabilities and gesture quality. Suggestions for future improvement to retain model performance during online interaction in Social XR are made. A project page with videos of the generated gestures is available at https://nkrome.github.io/CAGE.html.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Time as space vs. time as quantity in Spanish: a co-speech gesture study
    Alcaraz Carrion, Daniel
    Valenzuela, Javier
    LANGUAGE AND COGNITION, 2021, : 1 - 18
  • [22] Real-Time Speech Driven Gesture Animation
    Kasarci, Kenan
    Bozkurt, Elif
    Yemez, Yucel
    Erzin, Engin
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1917 - 1920
  • [23] DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
    Yang, Sicheng
    Wu, Zhiyong
    Li, Minglei
    Zhang, Zhensong
    Hao, Lei
    Bao, Weihong
    Cheng, Ming
    Xiao, Long
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5860 - 5868
  • [24] Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
    He, Xu
    Huang, Qiaochu
    Zhang, Zhensong
    Lin, Zhiwei
    Wu, Zhiyong
    Yang, Sicheng
    Li, Minglei
    Chen, Zhiyi
    Xu, Songcen
    Wu, Xiaofei
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2263 - 2273
  • [25] Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation
    Yazdian, Payam Jome
    Chen, Mo
    Lim, Angelica
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3100 - 3107
  • [26] Towards Culture-Aware Co-Speech Gestures for Social Robots
    Ariel Gjaci
    Carmine Tommaso Recchiuto
    Antonio Sgorbissa
    International Journal of Social Robotics, 2022, 14 : 1493 - 1506
  • [27] Towards Culture-Aware Co-Speech Gestures for Social Robots
    Gjaci, Ariel
    Recchiuto, Carmine Tommaso
    Sgorbissa, Antonio
    INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2022, 14 (06) : 1493 - 1506
  • [28] The Interaction Space Considering Speaker-Hearer Location in Co-speech Gesture Analysis and Annotation
    Laparle, Schuyler
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT: ANTHROPOMETRY, HUMAN BEHAVIOR, AND COMMUNICATION, PT I, 2022, 13319 : 243 - 262
  • [29] Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
    Deichler, Anna
    Mehta, Shivam
    Alexanderson, Simon
    Beskow, Jonas
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 755 - 762
  • [30] Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis
    Voss, Hendric
    Kopp, Stefan
    PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,