Predicting Head Pose from Speech with a Conditional Variational Autoencoder

被引:20
|
作者
Greenwood, David [1 ]
Laycock, Stephen [1 ]
Matthews, Iain [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
speech animation; head motion synthesis; visual prosody; generative models; BLSTM; CVAE; NETWORKS;
D O I
10.21437/Interspeech.2017-894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we. as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally co-occurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Several previous authors have shown that prediction is possible, but experiments are typically confined to rigidly produced dialogue. Natural, expressive, emotive and prosodic speech exhibit motion patterns that are far more difficult to predict with considerable variation in expected head pose. Recently, Long Short Term Memory (LSTM) networks have become an important tool for modelling speech and natural language tasks. We employ Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language, to model the relationship that speech has with rigid head motion. We then extend our model by conditioning with prior motion. Finally, we introduce a generative head motion model, conditioned on audio features using a Conditional Variational Autoencoder (CVAE). Each approach mitigates the problems of the one to many mapping that a speech to head pose model must accommodate.
引用
收藏
页码:3991 / 3995
页数:5
相关论文
共 50 条
  • [1] Trajectory Prediction with a Conditional Variational Autoencoder
    Barbie, Thibault
    Nishio, Takaki
    Nishida, Takeshi
    [J]. JOURNAL OF ROBOTICS AND MECHATRONICS, 2019, 31 (03) : 493 - 499
  • [2] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
    Kim, Jaehyeon
    Kong, Jungil
    Son, Juhee
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] A RECURRENT VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 371 - 375
  • [4] Conditional Introspective Variational Autoencoder for Image Synthesis
    Zheng, Kun
    Cheng, Yafan
    Kang, Xiaojun
    Yao, Hong
    Tian, Tian
    [J]. IEEE ACCESS, 2020, 8 : 153905 - 153913
  • [5] Conditional Variational Autoencoder for Learned Image Reconstruction
    Zhang, Chen
    Barbano, Riccardo
    Jin, Bangti
    [J]. COMPUTATION, 2021, 9 (11)
  • [6] Conditional Variational AutoEncoder based on Stochastic Attacks
    Zaid, Gabriel
    Bossuet, Lilian
    Carbone, Mathieu
    Habrard, Amaury
    Venelli, Alexandre
    [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2023, 2023 (02): : 310 - 357
  • [7] Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder
    Hwanhee Kim
    Soohyun Ko
    Byung Ju Kim
    Sung Jin Ryu
    Jaegyoon Ahn
    [J]. Journal of Cheminformatics, 14
  • [8] Variational autoencoder for predicting DNA flexibility
    Gordiychuk, Margarita
    Zhang, Yaojun
    [J]. BIOPHYSICAL JOURNAL, 2024, 123 (03) : 497A - 497A
  • [9] Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder
    Kim, Hwanhee
    Ko, Soohyun
    Kim, Byung Ju
    Ryu, Sung Jin
    Ahn, Jaegyoon
    [J]. JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [10] Speech Enhancement Using Dynamical Variational AutoEncoder
    Do, Hao D.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 247 - 258