Multi-modal Correlated Network for emotion recognition in speech

被引:19
|
作者
Ren, Minjie [1 ]
Nie, Weizhi [1 ]
Liu, Anan [1 ]
Su, Yuting [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
关键词
Multi-modal; Emotion recognition; Neural networks;
D O I
10.1016/j.visinf.2019.10.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growing demand of automatic emotion recognition system, emotion recognition is becoming more and more crucial for human-computer interaction (HCI) research. Recently, there is a continuous improvement in the performance of automatic emotion recognition due to the development of both hardware and deep learning methods. However, because of the abstract concept and multiple expressions of emotion, automatic emotion recognition is still a challenging task. In this paper, we propose a novel Multi-modal Correlated Network for emotion recognition aiming at exploiting the information from both audio and visual channels to achieve more robust and accurate detection. In the proposed method, the audio signals and visual signals are first preprocessed for the feature extraction. After preprocessing, we obtain the Mel-spectrograms, which can be treated as images, and the representative frames from visual segments. Then the Mel-spectrograms are fed to the convolutional neural network (CNN) to get the audio features and the representative frames are fed to the CNN and LSTM to get features. Specially, we employ the triplet loss to increase the differentiation of inter-class. Meanwhile, we propose a novel correlated loss to reduce the differentiation of intra-class. Finally, we apply the feature fusion method to fuse the audio and visual feature for emotion recognition classification. The experimental result on AEFW dataset demonstrates the correlation information of multiple modals is crucial for automatic emotion recognition and the proposed method can achieve the state-of-the-art performance on the classification task. (C) 2019 Zhejiang University and Zhejiang University Press. Published by Elsevier B.V.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 50 条
  • [1] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [2] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [3] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [4] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [5] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [6] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [7] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    [J]. INFORMATION SCIENCES, 2023, 619 : 679 - 694
  • [8] Dense Attention Memory Network for Multi-modal emotion recognition
    Ma, Gailing
    Guo, Xiao
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
  • [9] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [10] Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
    Byun, Sung-Woo
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (17):