Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引:7
|
作者
Byun, Sung-Woo [1 ]
Kim, Ju-Hee [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
speech emotion recognition; emotion recognition; multi-modal emotion recognition;
D O I
10.3390/app11177967
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [2] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    [J]. 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [3] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [4] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [5] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [6] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [7] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    [J]. VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [8] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [9] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [10] Facial emotion recognition using multi-modal information
    De Silva, LC
    Miyasato, T
    Nakatsu, R
    [J]. ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401