Audio-Visual Emotion Recognition Based on Facial Expression and Affective Speech

被引:0
|
作者
Zhang, Shiqing [1 ,2 ]
Li, Lemin [1 ]
Zhao, Zhijin [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Commun & Informat Engn, Chengdu 611731, Peoples R China
[2] Taizhou Univ, Sch Phys & Elect Engn, Taizhou 318000, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Communicat Engn, Hangzhou 310018, Peoples R China
来源
关键词
Emotion recognition; Local binary patterns; Acoustic features; Support vector machines; HUMAN-COMPUTER INTERACTION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the performance of audio-visual emotion recognition integrating facial expression and affective speech is investigated. The local binary patterns (LBP) features are extracted for facial image representations for the single facial expression recognition. Three typical acoustic features including prosody features, voice quality features as well as the Mel-Frequency Cepstral Coefficients (MFCC) features are extracted for the single speech emotion recognition. Then, we fuse the two modalities, i.e. facial expression and affective speech, and performed audio-visual emotion recognition at the feature-level. The support vector machines (SVM) is used for all the emotion classification. Experimental results on the publicly available eNTERFACE' 05 emotional audio-visual database demonstrate that the presented method of audio-visual expression recognition obtains an accuracy of 66.51%, giving better performance than the mono-modality.
引用
收藏
页码:46 / +
页数:3
相关论文
共 50 条
  • [21] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    [J]. AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [22] Deep operational audio-visual emotion recognition
    Akturk, Kaan
    Keceli, Ali Seydi
    [J]. NEUROCOMPUTING, 2024, 588
  • [23] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [24] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [25] AUDIO-VISUAL SPEECH RECOGNITION INCORPORATING FACIAL DEPTH INFORMATION CAPTURED BY THE KINECT
    Galatas, Georgios
    Potamianos, Gerasimos
    Makedon, Fillia
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2714 - 2717
  • [26] Noisy Speech Recognition Based on Combined Audio-Visual Classifiers
    Terissi, Lucas D.
    Sad, Gonzalo D.
    Gomez, Juan C.
    Parodi, Marianela
    [J]. MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, 2015, 8869 : 43 - 53
  • [27] Audio-visual speech recognition in a Portuguese language based application
    Pera, V
    Sá, F
    Afonso, P
    Ferreira, R
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2003, : 688 - 692
  • [28] Investigation of DNN-Based Audio-Visual Speech Recognition
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
  • [29] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [30] Depth-based Features in Audio-Visual Speech Recognition
    Palecek, Karel
    Chaloupka, Josef
    [J]. 2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 303 - 306