Multi-modal Emotion Recognition using Speech Features and Text Embedding

被引：0

作者：

Kim J.-H. ^{[1
]}

Lee S.-P. ^{[2
]}

机构：

[1] Dept. of Computer Science, Sangmyung University

[2] Dept. of Electronic Engineering, Sangmyung University

来源：

Transactions of the Korean Institute of Electrical Engineers | 2021年 / 70卷 / 01期

关键词：

Speech emotion recognition. emotion recognition. multi-modal emotion recognition. deep learning;

D O I：

10.5370/KIEE.2021.70.1.108

中图分类号：

学科分类号：

摘要：

Many studies have been conducted emotion recognition using audio signals as it is easy to collect. However, the accuracy is lower than other methods such as using facial images or video signals. In this paper, we propose an emotion recognition using speech signals and text simultaneously to achieve better performance. For training, we generate 43 feature vectors like mfcc, spectral features and harmonic features from audio data. Also 256 embedding vectors is extracted from text data using pretrained Tacotron encoder. Feature vectors and text embedding vectors are fed into each LSTM layer and frilly connected layer which produces a probability distribution over predicted output classes. By combining the average of both results, the data is assigned to one of four emotion categories : Anger, happiness, sadness, neutrality. Our proposed model outperforms previous state-of-the-art methods when they use Korean emotional speech dataset. © 2021 Korean Institute of Electrical Engineers. All rights reserved.

引用

页码：108 / 113

页数：5

共 50 条

[21] A Deep GRU-BiLSTM Network for Multi-modal Emotion Recognition from Text
Yacoubi, Ibtissem
Ferjaoui, Radhia
Ben Khalifa, Anouar
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 138 - 143
[22] Multi-modal embeddings using multi-task learning for emotion recognition
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
INTERSPEECH 2020, 2020, : 384 - 388
[23] Multi-modal mathematics: Conveying Math using synthetic speech and speech recognition
Fitzpatrick, D
Karshmer, AI
COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS: PROCEEDINGS, 2004, 3118 : 644 - 647
[24] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
Cai, Linqin
Dong, Jiangong
Wei, Min
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
[25] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
Liu, Dong
Wang, Zhiyong
Wang, Lifeng
Chen, Longxi
FRONTIERS IN NEUROROBOTICS, 2021, 15
[26] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
Speech Communication, 2022, 139 : 1 - 9
[27] Multi-Modal Emotion Recognition for Online Education Using Emoji Prompts
Qin, Xingguo
Zhou, Ya
Li, Jun
APPLIED SCIENCES-BASEL, 2024, 14 (12):
[28] A multi-modal Eliza using natural language processing and emotion recognition
Fitrianie, S
Wiggers, P
Rothkrantz, LJM
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 394 - 399
[29] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
SPEECH COMMUNICATION, 2022, 139 : 1 - 9
[30] A Multi-Modal Approach to Emotion Recognition using Undirected Topic Models
Shah, Mohit
Chakrabarti, Chaitali
Spanias, Andreas
2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 754 - 757

← 1 2 3 4 5 →