Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network

被引：10

作者：

Li, Juan ^{[1
,2
]}

Zhang, Xueying ^{[1
]}

Huang, Lixia ^{[1
]}

Li, Fenglian ^{[1
]}

Duan, Shufei ^{[1
]}

Sun, Ying ^{[1
]}

机构：

[1] Taiyuan Univ Technol, Coll Informat & Comp, Jinzhong 030600, Peoples R China

[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 19期

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; deep learning; Mel spectrogram; IMel spectrogram; STACKED SPARSE AUTOENCODER; SPECTRAL FEATURES; STRESS RECOGNITION; NEURAL-NETWORK; MODEL; PSO;

D O I：

10.3390/app12199518

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Emotion recognition is the computer's automatic recognition of the emotional state of input speech. It is a hot research field, resulting from the mutual infiltration and interweaving of phonetics, psychology, digital signal processing, pattern recognition, and artificial intelligence. At present, speech emotion recognition has been widely used in the fields of intelligent signal processing, smart medical care, business intelligence, assistant lie detection, criminal investigation, the service industry, self-driving cars, voice assistants of smartphones, and human psychoanalysis, etc. In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.

引用

页数：20

共 50 条

[41] Wearable Wireless Dual-Channel EEG System for Emotion Recognition Based on Machine Learning
Wang, Yue
Tian, Wei
Xu, Jingyi
Tian, Yingnan
Xu, Chengtao
Ma, Biao
Hao, Qing
Zhao, Chao
Liu, Hong
IEEE SENSORS JOURNAL, 2023, 23 (18) : 21767 - 21775
[42] Recognition and detection of unusual activities in ATM using dual-channel capsule generative adversarial network
Kajendran, K.
Mayan, J. Albert
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
[43] Modulation classification based on the collaboration of dual-channel CNN-LSTM and residual network
Li Hui
Li Shanshan
Zou Borong
Chen Yannan
The Journal of China Universities of Posts and Telecommunications, 2022, 29 (01) : 113 - 124
[44] Inplace Gated Convolutional Recurrent Neural Network For Dual-channel Speech Enhancement
Liu, Jinjiang
Zhang, Xueliang
INTERSPEECH 2021, 2021, : 1852 - 1856
[45] Collaborative Radio Frequency Fingerprint Identification Using Dual-Channel Parallel CNN
Wang, Hanbo
Wang, Jian
2024 INTERNATIONAL CONFERENCE ON UBIQUITOUS COMMUNICATION, UCOM 2024, 2024, : 351 - 355
[46] Simulation of English speech emotion recognition based on transfer learning and CNN neural network
Chen, Xuehua
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2349 - 2360
[47] Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
Li, Guanjun
Liang, Shan
Nie, Shuai
Liu, Wenju
Yang, Zhanlei
NEURAL NETWORKS, 2021, 141 : 225 - 237
[48] Background noise reduction via dual-channel scheme for speech recognition in vehicular environment
Ahn, S
Ko, H
ICCE: 2005 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2005, : 461 - 462
[49] Background noise reduction via dual-channel scheme for speech recognition in vehicular environment
Ahn, S
Ko, H
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2005, 51 (01) : 22 - 27
[50] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Mishra, Swami
Bhatnagar, Nehal
Prakasam, P.
Sureshkumar, T. R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620

← 1 2 3 4 5 →