Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network

被引：27

作者：

Guo, Lili ^{[1
]}

Wang, Longbiao ^{[1
]}

Dang, Jianwu ^{[1
,2
]}

Zhang, Linjuan ^{[1
]}

Guan, Haotian ^{[3
]}

Li, Xiangang ^{[4
]}

机构：

[1] Tianjin Univ, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan

[3] Intelligent Spoken Language Technol Tianjin Co, Tianjin, Peoples R China

[4] Didi Chuxing, AI Labs, Beijing, Peoples R China

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; amplitude; phase information; convolutional neural network;

D O I：

10.21437/Interspeech.2018-2156

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previous studies of speech emotion recognition utilize convolutional neural network (CNN) directly on amplitude spectrogram to extract features. CNN combines with bidirectional long short term memory (BLSTM) has become the state-of-the-art model. However, phase information has been ignored in this model. The importance of phase information in speech processing field is gathering attention. In this paper, we propose feature extraction of amplitude spectrogram and phase information using CNN for speech emotion recognition. The modified group delay cepstral coefficient (MGDCC) and relative phase are used as phase information. Firstly, we analyze the influence of phase information on speech emotion recognition. Then we design a CNN-based feature representation using amplitude and phase information. Finally, experiments were conducted on EmoDB to validate the effectiveness of phase information. Integrating amplitude spectrogram with phase information, the relative emotion error recognition rates are reduced by over 33% in comparison with using only amplitude-based feature.

引用

页码：1611 / 1615

页数：5

共 50 条

[41] Continuous Speech Emotion Recognition with Convolutional Neural Networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
[42] Continuous speech emotion recognition with convolutional neural networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
[43] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[44] A Study on Speech Emotion Recognition Using a Deep Neural Network
Lee, Kyong Hee
Choi, Hyun Kyun
Jang, Byung Tae
Kim, Do Hyun
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
[45] Speech Emotion Recognition Using Neural Network and Wavelet Features
Roy, Tanmoy
Marwala, Tshilidzi
Chakraverty, S.
RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
[46] Emotion Classification Based on Convolutional Neural Network Using Speech Data
Vrebcevic, N.
Mijic, I.
Petrinovic, D.
2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1007 - 1012
[47] A Convolutional Neural Network for Emotion Assessment and Recognition
Anyanwu, Comfort
Hays, Caitlin
2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 759 - 763
[48] Speech recognition for people with dysphasia using convolutional neural network
Lin, Bo-Yu
Huang, Hung-Shing
Sheu, Ruey-Kai
Chang, Yue-Shan
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 2164 - 2169
[49] Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Kim, Myungjong
Cao, Beiming
An, Kwanghoon
Wang, Jun
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2948 - 2952
[50] Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Ristea, Nicolae-Catalin
Dutu, Liviu Cristian
Radoi, Anamaria
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,

← 1 2 3 4 5 →