Speech Emotion Recognition Using Speech Feature and Word Embedding

被引:0
|
作者
Atmaja, Bagus Tris [1 ,2 ]
Shirai, Kiyoaki [2 ]
Akagi, Masato [2 ]
机构
[1] Inst Teknol Sepuluh Nopember, Surabaya, Indonesia
[2] Japan Adv Inst Sci & Technol, Nomi, Japan
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Emotion recognition can be performed automatically from many modalities. This paper presents a categorical speech emotion recognition using speech feature and word embedding. Text features can be combined with speech features to improve emotion recognition accuracy, and both features can be obtained from speech. Here, we use speech segments, by removing silences in an utterance, where the acoustic feature is extracted for speech-based emotion recognition. Word embedding is used as an input feature for text emotion recognition and a combination of both features is proposed for performance improvement purpose. Two unidirectional LSTM layers are used for text and fully connected layers are applied for acoustic emotion recognition. Both networks then are merged by fully connected networks in early fusion way to produce one of four predicted emotion categories. The result shows the combination of speech and text achieve higher accuracy i.e. 75.49% compared to speech only with 58.29% or text only emotion recognition with 68.01%. This result also outperforms the previously proposed methods by others using the same dataset on the same modalities.
引用
收藏
页码:519 / 523
页数:5
相关论文
共 50 条
  • [21] Feature selection for emotion recognition of mandarin speech
    College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    不详
    Zhejiang Daxue Xuebao (Gongxue Ban), 2007, 11 (1816-1822):
  • [22] Discriminative Feature Learning for Speech Emotion Recognition
    Zhang, Yuying
    Zou, Yuexian
    Peng, Junyi
    Luo, Danqing
    Huang, Dongyan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
  • [23] EESpectrum Feature Representations for Speech Emotion Recognition
    Zhao, Ziping
    Zhao, Yiqin
    Bao, Zhongtian
    Wang, Haishuai
    Zhang, Zixing
    Li, Chao
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 27 - 33
  • [24] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
  • [25] Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction
    Chuang, Shun-Po
    Liu, Alexander H.
    Sung, Tzu-Wei
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 93 - 105
  • [26] Speech emotion recognition using MFCC-based entropy feature
    Siba Prasad Mishra
    Pankaj Warule
    Suman Deb
    Signal, Image and Video Processing, 2024, 18 : 153 - 161
  • [27] Speech emotion recognition using MFCC-based entropy feature
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 153 - 161
  • [28] Speech Emotion Recognition using Feature Selection with Adaptive Structure Learning
    Rayaluru, Akshay
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2019), 2019, : 233 - 236
  • [29] Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
    Pulatov, Ilkhomjon
    Oteniyazov, Rashid
    Makhmudov, Fazliddin
    Cho, Young-Im
    SENSORS, 2023, 23 (14)
  • [30] Speech emotion recognition using semi-NMF feature optimization
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3741 - 3757