Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images

被引:2
|
作者
Wang, Chuanyu [1 ]
Li, Weixiang [1 ]
Chen, Zhenhuan [1 ]
机构
[1] Colloge of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing,211816, China
关键词
Backpropagation - Convolutional neural networks - Image enhancement - Long short-term memory - Modal analysis - Signal analysis - Speech recognition;
D O I
10.3778/j.issn.1002-8331.2104-0306
中图分类号
学科分类号
摘要
Emotion recognition is one of the important research fields of artificial intelligence, which relies on the analysis of physiological signals and behavioral characteristics to analyze emotion categories. In order to improve the accuracy of emotion recognition, a multi-modal emotion recognition method based on voice and video images is proposed. The video image modality is realized by using the Local Binary Patterns Histograms method(LBPH)and Sparse Auto-Encoder (SAE)and the improved Convolutional Neural Network(CNN). The voice modality is realized by using the improved Deep-restricted Boltzmann Machine(DBM)and the improved Long-Short Term Memory(LSTM). More detailed features of the image can be obtained by using SAE, deep expression of sound characteristics can be obtained by using DBM, the Back Propagation method(BP)are used to optimize the nonlinear mapping capability of DBM and LSTM, the Global Average Pooling(GAP)method are used to improve the response speed of CNN and LSTM and prevent overfitting. After single modality identification, the recognition results of the two modalities are fused at the decision level layer based on the weight criterion, and the probabilities of different emotion types will be given. The experimental results show that compared with the traditional single-modal emotion recognition, the method proposed can improve the recognition accuracy, and achieves a recognition rate of 74.9% in the test set of the Chinese natural audio-visual emotion database(cheavd)2.0. It can also be used for real-time analysis of emotions. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:163 / 170
相关论文
共 50 条
  • [1] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [2] Multi-modal Emotion Recognition Based on Hypergraph
    Zong, Lin-Lin
    Zhou, Jia-Hui
    Xie, Qiu-Jie
    Zhang, Xian-Chao
    Xu, Bo
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
  • [3] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Weizhi Nie
    Yan Yan
    Dan Song
    Kun Wang
    [J]. Multimedia Tools and Applications, 2021, 80 : 16205 - 16214
  • [4] Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition
    Nie, Weizhi
    Yan, Yan
    Song, Dan
    Wang, Kun
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16205 - 16214
  • [5] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [6] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    [J]. 2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [7] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
    Xu, Chao
    Cao, Tianyi
    Feng, Zhiyong
    Dong, Caichao
    [J]. CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550
  • [8] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
    Chang, Xin
    Skarbek, Wladyslaw
    [J]. SENSORS, 2021, 21 (16)
  • [9] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 364 - 368
  • [10] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,