Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引:7
|
作者
Byun, Sung-Woo [1 ]
Kim, Ju-Hee [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
speech emotion recognition; emotion recognition; multi-modal emotion recognition;
D O I
10.3390/app11177967
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Multi-modal embeddings using multi-task learning for emotion recognition
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    [J]. INTERSPEECH 2020, 2020, : 384 - 388
  • [22] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
    Liu, Dong
    Wang, Zhiyong
    Wang, Lifeng
    Chen, Longxi
    [J]. FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [23] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
    Cai, Linqin
    Dong, Jiangong
    Wei, Min
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
  • [24] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. Speech Communication, 2022, 139 : 1 - 9
  • [25] Multi-Modal Emotion Recognition for Online Education Using Emoji Prompts
    Qin, Xingguo
    Zhou, Ya
    Li, Jun
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [26] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. SPEECH COMMUNICATION, 2022, 139 : 1 - 9
  • [27] A Multi-Modal Approach to Emotion Recognition using Undirected Topic Models
    Shah, Mohit
    Chakrabarti, Chaitali
    Spanias, Andreas
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 754 - 757
  • [28] A multi-modal Eliza using natural language processing and emotion recognition
    Fitrianie, S
    Wiggers, P
    Rothkrantz, LJM
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 394 - 399
  • [29] Intelligent ear for emotion recognition: Multi-modal emotion recognition via acoustic features, semantic contents and facial images
    Wu, CH
    Chuang, ZJ
    [J]. 8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: APPLICATIONS OF CYBERNETICS AND INFORMATICS IN OPTICS, SIGNALS, SCIENCE AND ENGINEERING, 2004, : 122 - 127
  • [30] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
    Nhat Truong Pham
    Duc Ngoc Minh Dang
    Bich Ngoc Hong Pham
    Sy Dzung Nguyen
    [J]. PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238