Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引：7

作者：

Byun, Sung-Woo ^{[1
]}

Kim, Ju-Hee ^{[1
]}

Lee, Seok-Pil ^{[2
]}

机构：

[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea

[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期

关键词：

speech emotion recognition; emotion recognition; multi-modal emotion recognition;

D O I：

10.3390/app11177967

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.

引用

页数：9

共 50 条

[41] Lightweight multi-modal emotion recognition model based on modal generation
Liu, Peisong
Che, Manqiang
Luo, Jiangchuan
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
[42] Cross-modal dynamic convolution for multi-modal emotion recognition
Wen, Huanglu
You, Shaodi
Fu, Ying
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
[43] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
Na, Byeonghu
Kim, Yoonsik
Park, Sungrae
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 446 - 463
[44] Speech Emotion Recognition Using Speech Feature and Word Embedding
Atmaja, Bagus Tris
Shirai, Kiyoaki
Akagi, Masato
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
[45] Text-independent speech emotion recognition using frequency adaptive features
Wu, Chenjian
Huang, Chengwei
Chen, Hong
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 24353 - 24363
[46] Text-independent speech emotion recognition using frequency adaptive features
Chenjian Wu
Chengwei Huang
Hong Chen
Multimedia Tools and Applications, 2018, 77 : 24353 - 24363
[47] Multi-modal speech emotion detection using optimised deep neural network classifier
Padman, Sweta Nishant
Magare, Dhiraj
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (05): : 2020 - 2038
[48] Dynamic Confidence-Aware Multi-Modal Emotion Recognition
Zhu, Qi
Zheng, Chuhang
Zhang, Zheng
Shao, Wei
Zhang, Daoqiang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1358 - 1370
[49] Multi-modal fusion network with complementarity and importance for emotion recognition
Liu, Shuai
Gao, Peng
Li, Yating
Fu, Weina
Ding, Weiping
INFORMATION SCIENCES, 2023, 619 : 679 - 694
[50] Using Holonic Multi-agent Architecture to deal with complexity in Multi-modal emotion recognition
Boutefara, Tarek
Mahdaoui, Latifa
2020 4TH INTERNATIONAL CONFERENCE ON ADVANCED ASPECTS OF SOFTWARE ENGINEERING (ICAASE'2020): 4TH INTERNATIONAL CONFERENCE ON ADVANCED ASPECTS OF SOFTWARE ENGINEERING, 2020, : 118 - 125

← 1 2 3 4 5 →