Automatic Quality Assessment of Speech-Driven Synthesized Gestures

被引：4

作者：

He, Zhiyuan ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland

来源：

INTERNATIONAL JOURNAL OF COMPUTER GAMES TECHNOLOGY | 2022年 / 2022卷

关键词：

D O I：

10.1155/2022/1828293

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The automatic synthesis of realistic gestures has the ability to change the fields of animation, avatars, and communication agents. Although speech-driven synthetic gesture generation methods have been proposed and optimized, the evaluation system of synthetic gestures is still lacking. The current evaluation method still needs manual participation, but it is inefficient in the industry of synthetic gestures and has the interference of human factors. So we need a model that can construct an automatic and objective quantitative quality assessment of the synthesized gesture video. We noticed that recurrent neural networks (RNN) have advantages in modeling advanced spatiotemporal feature sequences, which are very suitable for use in the processing of synthetic gesture video data. Therefore, to build an automatic quality assessment system, we propose in our work a model based on Bi-LSTM and make a little adjustment on the attention mechanism in it. Also, the evaluation method is proposed and experiments are designed to prove that the improved model of the algorithm can complete the quantitative evaluation of synthetic gestures. At the same time, in terms of performance, the model has an improvement of about 20% compared to before the algorithm adjustment.

引用

页数：11

共 50 条

[1] Speech-driven automatic facial expression synthesis
Bozkurt, Elif
Erdem, Cigdem Eroglu
Erzin, Engin
Erdem, Tanju
Oezkan, Mehmet
Tekalp, A. Murat
[J]. 2008 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, : 253 - +
[2] Automatic Dataset Collection for Speech-Driven Gesture Generation
Nagi, Takafumi
Kaneko, Naoshi
Ito, Seiya
Sumi, Kazuhiko
[J]. FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
[3] Learning Speech-driven 3D Conversational Gestures from Video
Habibie, Ikhsanul
Xu, Weipeng
Mehta, Dushyant
Liu, Lingjie
Seidel, Hans-Peter
Pons-Moll, Gerard
Elgharib, Mohamed
Theobalt, Christian
[J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA), 2021, : 101 - 108
[4] Speech-Driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
Zhang, Fan
Wang, Zhaohan
Lyu, Xin
Zhao, Siyuan
Li, Mengjian
Geng, Weidong
Ji, Naye
Du, Hui
Gao, Fuxing
Wu, Hao
Li, Shunman
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6984 - 6996
[5] Comparing text-driven and speech-driven visual speech synthesisers
Theobald, Barry-John
Cawley, Gavin
Bangham, Andrew
Matthews, Iain
Wilkinson, Nicholas
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2322 - 2322
[6] Speech-driven animation with meaningful behaviors
Sadoughi, Najmeh
Busso, Carlos
[J]. SPEECH COMMUNICATION, 2019, 110 : 90 - 100
[7] Expressive speech-driven facial animation
Cao, Y
Tien, WC
Faloutsos, P
Pighin, F
[J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302
[8] InterActor: Speech-driven embodied interactive actor
Watanabe, T
Danbara, R
Okubo, M
[J]. IEEE ROMAN 2002, PROCEEDINGS, 2002, : 430 - 435
[9] On the Importance of Representations for Speech-Driven Gesture Generation
Kucherenko, Taras
Hasegawa, Dai
Kaneko, Naoshi
Henter, Gustav Eje
Kjellstrom, Hedvig
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2072 - 2074
[10] Constrained optimization for a speech-driven talking head
Choi, KH
Lee, JH
[J]. PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 560 - 563

← 1 2 3 4 5 →