Multi-Modal Emotion Recognition Fusing Video and Audio

被引:4
|
作者
Xu, Chao [1 ]
Du, Pufeng [2 ]
Feng, Zhiyong [2 ]
Meng, Zhaopeng [1 ]
Cao, Tianyi [2 ]
Dong, Caichao [2 ]
机构
[1] Tianjin Univ, Sch Comp Software, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
Emotion Recognition; Multi-modal Fusion; HMM; Multi-layer Perceptron;
D O I
10.12785/amis/070205
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animation Parameters are calculated from motions of facial feature points as expression features. We extract short-term mean energy, fundamental frequency and formant frequencies from each frame as speech features. An emotion classifier is designed to fuse facial expression and speech based on Hidden Markov Models and Multi-layer Perceptron. Experiments indicate that multi-modal fusion emotion recognition algorithm which is presented in this paper has relatively high recognition accuracy. The proposed approach has better performance and robustness than methods using only video or audio separately.
引用
收藏
页码:455 / 462
页数:8
相关论文
共 50 条
  • [1] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
    Chang, Xin
    Skarbek, Wladyslaw
    [J]. SENSORS, 2021, 21 (16)
  • [2] Low-level fusion of audio and video feature for multi-modal emotion recognition
    Wimmer, Matthias
    Schuller, Bjoern
    Arsic, Dejan
    Rigoll, Gerhard
    Radig, Bernd
    [J]. VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 145 - +
  • [3] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [4] Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction
    Brady, Kevin
    Gwon, Youngjune
    Khorrami, Pooya
    Godoy, Elizabeth
    Campbell, William
    Dagli, Charlie
    Huang, Thomas S.
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 97 - 104
  • [5] Fusing Multi-modal Features for Gesture Recognition
    Wu, Jiaxiang
    Cheng, Jian
    Zhao, Chaoyang
    Lu, Hanqing
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 453 - 459
  • [6] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images
    Wang, Chuanyu
    Li, Weixiang
    Chen, Zhenhuan
    [J]. Computer Engineering and Applications, 2024, 57 (23) : 163 - 170
  • [8] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [9] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
    GOUTSU Yusuke
    KOBAYASHI Takaki
    OBARA Junya
    KUSAJIMA Ikuo
    TAKEICHI Kazunari
    TAKANO Wataru
    NAKAMURA Yoshihiko
    [J]. Chinese Journal of Mechanical Engineering, 2015, (04) : 657 - 665
  • [10] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
    Goutsu, Yusuke
    Kobayashi, Takaki
    Obara, Junya
    Kusajima, Ikuo
    Takeichi, Kazunari
    Takano, Wataru
    Nakamura, Yoshihiko
    [J]. CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2015, 28 (04) : 657 - 665