Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引:7
|
作者
Byun, Sung-Woo [1 ]
Kim, Ju-Hee [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
speech emotion recognition; emotion recognition; multi-modal emotion recognition;
D O I
10.3390/app11177967
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [42] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [43] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
    Na, Byeonghu
    Kim, Yoonsik
    Park, Sungrae
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 446 - 463
  • [44] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [45] Text-independent speech emotion recognition using frequency adaptive features
    Wu, Chenjian
    Huang, Chengwei
    Chen, Hong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 24353 - 24363
  • [46] Text-independent speech emotion recognition using frequency adaptive features
    Chenjian Wu
    Chengwei Huang
    Hong Chen
    Multimedia Tools and Applications, 2018, 77 : 24353 - 24363
  • [47] Multi-modal speech emotion detection using optimised deep neural network classifier
    Padman, Sweta Nishant
    Magare, Dhiraj
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (05): : 2020 - 2038
  • [48] Dynamic Confidence-Aware Multi-Modal Emotion Recognition
    Zhu, Qi
    Zheng, Chuhang
    Zhang, Zheng
    Shao, Wei
    Zhang, Daoqiang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1358 - 1370
  • [49] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 619 : 679 - 694
  • [50] Using Holonic Multi-agent Architecture to deal with complexity in Multi-modal emotion recognition
    Boutefara, Tarek
    Mahdaoui, Latifa
    2020 4TH INTERNATIONAL CONFERENCE ON ADVANCED ASPECTS OF SOFTWARE ENGINEERING (ICAASE'2020): 4TH INTERNATIONAL CONFERENCE ON ADVANCED ASPECTS OF SOFTWARE ENGINEERING, 2020, : 118 - 125