Towards Real-time Co-speech Gesture Generation in Online Interaction in Social XR

被引：1

作者：

Krome, Niklas ^{[1
]}

Kopp, Stefan ^{[1
]}

机构：

[1] Bielefeld Univ, Bielefeld, Germany

来源：

PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023 | 2023年

关键词：

extended reality; social interaction; animation; gesture generation; BEHAVIOR; QUALITY;

D O I：

10.1145/3570945.3607315

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extended Reality (XR) has a potential to allow social interaction for people that are distant from one another, in educational, clinical or co-working applications, as well as for scientific studies. However, a full-blown embodied social presence and interaction via avatars in XR requires motion tracking hardware that many users do not have. At the same time, modern machine learning approaches enable the synthesis of natural and life-like nonverbal behavior, but only in offline settings and with considerable lag. We evaluate the applicability of current gesture generation systems for online interaction in social XR. We define a set of requirements for real-time-capable gesture generation and propose an approach to employ a state-of-the-art model in a real-time XR interaction pipeline. To test the model under conditions of online interaction, we divide an input audio stream into chunks of different lengths and stitch the resulting gesture animations together to form continuous motion. We evaluate the quality of the resulting multimodal avatar behavior in a user study. Our results show a significant trade-off between real-time generation capabilities and gesture quality. Suggestions for future improvement to retain model performance during online interaction in Social XR are made. A project page with videos of the generated gestures is available at https://nkrome.github.io/CAGE.html.

引用

页数：8

共 50 条

[21] Time as space vs. time as quantity in Spanish: a co-speech gesture study
Alcaraz Carrion, Daniel
Valenzuela, Javier
LANGUAGE AND COGNITION, 2021, : 1 - 18
[22] Real-Time Speech Driven Gesture Animation
Kasarci, Kenan
Bozkurt, Elif
Yemez, Yucel
Erzin, Engin
2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1917 - 1920
[23] DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
Yang, Sicheng
Wu, Zhiyong
Li, Minglei
Zhang, Zhensong
Hao, Lei
Bao, Weihong
Cheng, Ming
Xiao, Long
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5860 - 5868
[24] Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
He, Xu
Huang, Qiaochu
Zhang, Zhensong
Lin, Zhiwei
Wu, Zhiyong
Yang, Sicheng
Li, Minglei
Chen, Zhiyi
Xu, Songcen
Wu, Xiaofei
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2263 - 2273
[25] Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation
Yazdian, Payam Jome
Chen, Mo
Lim, Angelica
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3100 - 3107
[26] Towards Culture-Aware Co-Speech Gestures for Social Robots
Ariel Gjaci
Carmine Tommaso Recchiuto
Antonio Sgorbissa
International Journal of Social Robotics, 2022, 14 : 1493 - 1506
[27] Towards Culture-Aware Co-Speech Gestures for Social Robots
Gjaci, Ariel
Recchiuto, Carmine Tommaso
Sgorbissa, Antonio
INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2022, 14 (06) : 1493 - 1506
[28] The Interaction Space Considering Speaker-Hearer Location in Co-speech Gesture Analysis and Annotation
Laparle, Schuyler
DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT: ANTHROPOMETRY, HUMAN BEHAVIOR, AND COMMUNICATION, PT I, 2022, 13319 : 243 - 262
[29] Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Deichler, Anna
Mehta, Shivam
Alexanderson, Simon
Beskow, Jonas
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 755 - 762
[30] Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis
Voss, Hendric
Kopp, Stefan
PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,

← 1 2 3 4 5 →