Speech Emotion Recognition Using CNN

被引：208

作者：

Huang, Zhengwei ^{[1
]}

Dong, Ming ^{[2
]}

Mao, Qirong ^{[1
]}

Zhan, Yongzhao ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China

[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA

来源：

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14) | 2014年

关键词：

Speech emotion recognition; Salient feature learning;

D O I：

10.1145/2647868.2654984

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep learning systems, such as Convolutional Neural Networks (CNNs), can infer a hierarchical representation of input data that facilitates categorization. In this paper, we propose to learn affect-salient features for Speech Emotion Recognition (SER) using semi-CNN. The training of semi-CNN has two stages. In the first stage, unlabeled samples are used to learn candidate features by contractive convolutional neural network with reconstruction penalization. The candidate features, in the second step, are used as the input to semi-CNN to learn affect-salient, discriminative features using a novel objective function that encourages the feature saliency, orthogonality and discrimination. Our experiment results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and environment distortion), and outperforms several well-established SER features.

引用

页码：801 / 804

页数：4

共 50 条

[1] Learning Salient Features for Speech Emotion Recognition Using CNN
Liu, Jiamu
Han, Wenjing
Ruan, Huabin
Chen, Xiaomin
Jiang, Dongmei
Li, Haifeng
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[2] Comparative Analysis of Windows for Speech Emotion Recognition Using CNN
Teixeira, Felipe L.
Soares, Salviano Pinto
Abreu, J. L. Pio
Oliveira, Paulo M.
Teixeira, Joao P.
OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, PT I, OL2A 2023, 2024, 1981 : 233 - 248
[3] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
He, Jingru
Ren, Liyong
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
[4] A Combined CNN Architecture for Speech Emotion Recognition
Begazo, Rolinson
Aguilera, Ana
Dongo, Irvin
Cardinale, Yudith
SENSORS, 2024, 24 (17)
[5] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
Enriquez, Marc Dominic
Lucas, Crisron Rudolf
Aquino, Angelina
2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
[6] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
Dongdong Li
Linyu Sun
Xinlei Xu
Zhe Wang
Jing Zhang
Wenli Du
Neural Processing Letters, 2021, 53 : 4097 - 4115
[7] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
Li, Dongdong
Sun, Linyu
Xu, Xinlei
Wang, Zhe
Zhang, Jing
Du, Wenli
NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
[8] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Mishra, Swami
Bhatnagar, Nehal
Prakasam, P.
Sureshkumar, T. R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
[9] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Swami Mishra
Nehal Bhatnagar
Prakasam P
Sureshkumar T. R
Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
[10] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
Peng, Zixuan
Lu, Yu
Pan, Shengfeng
Liu, Yunfeng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024

← 1 2 3 4 5 →