Multimodal emotion recognition based on speech and ECG signals

被引:4
|
作者
Huang C. [1 ]
Jin Y. [1 ,2 ]
Wang Q. [1 ]
Zhao L. [1 ]
Zou C. [1 ]
机构
[1] Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University
[2] School of Physics and Electronics Engineering, Xuzhou Normal University
关键词
Decision level fusion; Emotion recognition; Feature level fusion; Multimodal;
D O I
10.3969/j.issn.1001-0505.2010.05.003
中图分类号
学科分类号
摘要
Through collecting and analyzing speech signals and electrocardiography(ECG) signals, emotion features and fusion algorithms are studied. First, annoyance is induced by noise stimulation and happiness is induced by comedy movie clips. The corresponding speech signals and ECG signals are recorded. Then, prosodic features and voice quality features are adopted for speech emotional features, and heart rate variability (HRV) features are used for ECG emotional features. Finally, the decision level fusion and the feature level fusion are accomplished by the weighted fusion method and the feature transformation method, respectively. The performances of the two fusion methods in speech emotion and ECG emotion recognition are compared. The experimental results show that for the same testing set, the average recognition rates of the single modal classifier based on the ECG signals and the single modal classifier based on the speech signals reach 71% and 80%, respectively, while that of the multi-modal classifier with the feature level fusion of the speech signals and the ECG signals achieves above 90%. The average recognition rate of the feature level fusion algorithm is higher than that of the decision level fusion algorithm. The different signal channels such as speech signals and ECG signals show a promising improvement in building a reliable emotion recognition system.
引用
收藏
页码:895 / 900
页数:5
相关论文
共 14 条
  • [1] Zeng Z., Pantic M., Roisman G.I., Et al., A survey of affect recognition methods: audio, visual and spontaneous expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1, pp. 39-58, (2009)
  • [2] Hoch S., Althoff F., McGlaun A., Et al., Bimodal fusion of emotional data in an automotive environment, Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1085-1088, (2005)
  • [3] Busso C., Deng Z., Yildirim S., Et al., Analysis of emotion recognition using facial expressions, speech and multimodal information, Proceedings of the Sixth International Conference on Multimodal Interfaces, pp. 205-211, (2004)
  • [4] Wagner J., Kim J., Andre E., From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification, Proceedings of the 2005 IEEE International Conference on Multimedia & Expo, pp. 940-943, (2005)
  • [5] Khiet T., How does real affect recognition in speech?, (2009)
  • [6] Tato R., Santos R., Kompe R., Et al., Emotion space improves emotion recognition, Proceedings of the 2002 International Conference on Speech and Language Processing, pp. 2029-2032, (2002)
  • [7] (2003)
  • [8] Schuller B., Rigoll G., Lang M., Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 577-580, (2004)
  • [9] Pittam J., Scherer K.R., Vocal Expression and Communication of Emotion, pp. 185-198, (1993)
  • [10] Biemans M., Gender variation in voice quality, (2000)