Automatic Quality Assessment of Speech-Driven Synthesized Gestures

被引:4
|
作者
He, Zhiyuan [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
D O I
10.1155/2022/1828293
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The automatic synthesis of realistic gestures has the ability to change the fields of animation, avatars, and communication agents. Although speech-driven synthetic gesture generation methods have been proposed and optimized, the evaluation system of synthetic gestures is still lacking. The current evaluation method still needs manual participation, but it is inefficient in the industry of synthetic gestures and has the interference of human factors. So we need a model that can construct an automatic and objective quantitative quality assessment of the synthesized gesture video. We noticed that recurrent neural networks (RNN) have advantages in modeling advanced spatiotemporal feature sequences, which are very suitable for use in the processing of synthetic gesture video data. Therefore, to build an automatic quality assessment system, we propose in our work a model based on Bi-LSTM and make a little adjustment on the attention mechanism in it. Also, the evaluation method is proposed and experiments are designed to prove that the improved model of the algorithm can complete the quantitative evaluation of synthetic gestures. At the same time, in terms of performance, the model has an improvement of about 20% compared to before the algorithm adjustment.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Speech-driven automatic facial expression synthesis
    Bozkurt, Elif
    Erdem, Cigdem Eroglu
    Erzin, Engin
    Erdem, Tanju
    Oezkan, Mehmet
    Tekalp, A. Murat
    [J]. 2008 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, : 253 - +
  • [2] Automatic Dataset Collection for Speech-Driven Gesture Generation
    Nagi, Takafumi
    Kaneko, Naoshi
    Ito, Seiya
    Sumi, Kazuhiko
    [J]. FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
  • [3] Learning Speech-driven 3D Conversational Gestures from Video
    Habibie, Ikhsanul
    Xu, Weipeng
    Mehta, Dushyant
    Liu, Lingjie
    Seidel, Hans-Peter
    Pons-Moll, Gerard
    Elgharib, Mohamed
    Theobalt, Christian
    [J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA), 2021, : 101 - 108
  • [4] Speech-Driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
    Zhang, Fan
    Wang, Zhaohan
    Lyu, Xin
    Zhao, Siyuan
    Li, Mengjian
    Geng, Weidong
    Ji, Naye
    Du, Hui
    Gao, Fuxing
    Wu, Hao
    Li, Shunman
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6984 - 6996
  • [5] Comparing text-driven and speech-driven visual speech synthesisers
    Theobald, Barry-John
    Cawley, Gavin
    Bangham, Andrew
    Matthews, Iain
    Wilkinson, Nicholas
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2322 - 2322
  • [6] Speech-driven animation with meaningful behaviors
    Sadoughi, Najmeh
    Busso, Carlos
    [J]. SPEECH COMMUNICATION, 2019, 110 : 90 - 100
  • [7] Expressive speech-driven facial animation
    Cao, Y
    Tien, WC
    Faloutsos, P
    Pighin, F
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302
  • [8] InterActor: Speech-driven embodied interactive actor
    Watanabe, T
    Danbara, R
    Okubo, M
    [J]. IEEE ROMAN 2002, PROCEEDINGS, 2002, : 430 - 435
  • [9] On the Importance of Representations for Speech-Driven Gesture Generation
    Kucherenko, Taras
    Hasegawa, Dai
    Kaneko, Naoshi
    Henter, Gustav Eje
    Kjellstrom, Hedvig
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2072 - 2074
  • [10] Constrained optimization for a speech-driven talking head
    Choi, KH
    Lee, JH
    [J]. PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 560 - 563