Speech-to-Gesture Generation: A Challenge in Deep Learning Approach with Bi-Directional LSTM

被引:24
|
作者
Takeuchi, Kenta [1 ]
Hasegawa, Dai [2 ]
Shirakawa, Shinichi [3 ]
Kaneko, Naoshi [2 ]
Sakuta, Hiroshi [2 ]
Sumi, Kazuhiko [2 ]
机构
[1] Aoyama Gakuin Univ, Grad Sch Sci & Engn, Sagamihara, Kanagawa, Japan
[2] Aoyama Gakuin Univ, Coll Sci & Engn, Sagamihara, Kanagawa, Japan
[3] Yokohama Natl Univ, Fac Environm & Informat Sci, Yokohama, Kanagawa, Japan
关键词
Deep Learning; Gesture Generation; Bi-Directional LSTM; Speech Features;
D O I
10.1145/3125739.3132594
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this research, we take a first step in generating motion data for gestures directly from speech features. Such a method can make creating gesture animations for Embodied Conversational Agents much easier. We implemented a model using Bi-Directional LSTM taking phonemic features from speech audio data as input to output time sequence data of rotations of bone joints. We assessed the validity of the predicted gesture motion data by evaluating the final loss value of the network, and evaluating the impressions of the predicted gesture by comparing it with the actual motion data that accompanied the audio data used for input and motion data that accompanied a different audio data. The results showed that the accuracy of the prediction for the LSTM model was better than a simple RNN model. In contrast, the impressions evaluation of the predicted gesture was rated lower than the original and mismatched gestures, although individually some predicted gestures were rated the same degree as the mismatched gestures.
引用
收藏
页码:365 / 369
页数:5
相关论文
共 50 条
  • [21] An hybrid deep learning approach for depression prediction from user tweets using feature-rich CNN and bi-directional LSTM
    Harnain Kour
    Manoj K. Gupta
    Multimedia Tools and Applications, 2022, 81 : 23649 - 23685
  • [22] LEARNING CHOREOGRAPHIC PRIMITIVES THROUGH A BAYESIAN OPTIMIZED BI-DIRECTIONAL LSTM MODEL
    Rallis, Ioannis
    Bakalos, Nikolaos
    Doulamis, Nikolaos
    Voulodimos, Athanasios
    Doulamis, Anastasios
    Protopapadakis, Eftychios
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1940 - 1944
  • [23] Exploring Bi-Directional Context for Improved Chatbot Response Generation Using Deep Reinforcement Learning
    Tran, Quoc-Dai Luong
    Le, Anh-Cuong
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [24] Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN
    Dong, Xishuang
    Chowdhury, Shanta
    Qian, Lijun
    Li, Xiangfang
    Guan, Yi
    Yang, Jinfeng
    Yu, Qiubin
    PLOS ONE, 2019, 14 (05):
  • [25] Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features
    Ullah, Amin
    Ahmad, Jamil
    Muhammad, Khan
    Sajjad, Muhammad
    Baik, Sung Wook
    IEEE ACCESS, 2018, 6 : 1155 - 1166
  • [26] Video Saliency Detection Using Bi-directional LSTM
    Chi, Yang
    Li, Jinjiang
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (06): : 2444 - 2463
  • [27] Attentional Bi-directional LSTM for Semantic Attribute Prediction
    Shen, Mengling
    Zhang, Xianlin
    Li, Xueming
    ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 217 - 221
  • [28] Bi-directional LSTM–CNN Combined method for Sentiment Analysis in Part of Speech Tagging (PoS)
    N. K. Senthil Kumar
    N. Malarvizhi
    International Journal of Speech Technology, 2020, 23 : 373 - 380
  • [29] Electricity Theft Detection in Smart Meters Using a Hybrid Bi-directional GRU Bi-directional LSTM Model
    Munawar, Shoaib
    Asif, Muhammad
    Kabir, Beenish
    Pamir
    Ullah, Ashraf
    Javaid, Nadeem
    COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2021, 2021, 278 : 297 - 308
  • [30] A bi-directional deep learning architecture for lung nodule semantic segmentation
    Bhattacharyya, Debnath
    Rao, N. Thirupathi
    Joshua, Eali Stephen Neal
    Hu, Yu-Chen
    VISUAL COMPUTER, 2023, 39 (11): : 5245 - 5261