Speech Emotion Recognition Using CNN

被引:195
|
作者
Huang, Zhengwei [1 ]
Dong, Ming [2 ]
Mao, Qirong [1 ]
Zhan, Yongzhao [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
关键词
Speech emotion recognition; Salient feature learning;
D O I
10.1145/2647868.2654984
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep learning systems, such as Convolutional Neural Networks (CNNs), can infer a hierarchical representation of input data that facilitates categorization. In this paper, we propose to learn affect-salient features for Speech Emotion Recognition (SER) using semi-CNN. The training of semi-CNN has two stages. In the first stage, unlabeled samples are used to learn candidate features by contractive convolutional neural network with reconstruction penalization. The candidate features, in the second step, are used as the input to semi-CNN to learn affect-salient, discriminative features using a novel objective function that encourages the feature saliency, orthogonality and discrimination. Our experiment results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and environment distortion), and outperforms several well-established SER features.
引用
收藏
页码:801 / 804
页数:4
相关论文
共 50 条
  • [1] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [2] Comparative Analysis of Windows for Speech Emotion Recognition Using CNN
    Teixeira, Felipe L.
    Soares, Salviano Pinto
    Abreu, J. L. Pio
    Oliveira, Paulo M.
    Teixeira, Joao P.
    [J]. OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, PT I, OL2A 2023, 2024, 1981 : 233 - 248
  • [3] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    [J]. 2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [4] A Combined CNN Architecture for Speech Emotion Recognition
    Begazo, Rolinson
    Aguilera, Ana
    Dongo, Irvin
    Cardinale, Yudith
    [J]. SENSORS, 2024, 24 (17)
  • [5] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
    Enriquez, Marc Dominic
    Lucas, Crisron Rudolf
    Aquino, Angelina
    [J]. 2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [6] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Dongdong Li
    Linyu Sun
    Xinlei Xu
    Zhe Wang
    Jing Zhang
    Wenli Du
    [J]. Neural Processing Letters, 2021, 53 : 4097 - 4115
  • [7] BLSTM and CNN Stacking Architecture for Speech Emotion Recognition
    Li, Dongdong
    Sun, Linyu
    Xu, Xinlei
    Wang, Zhe
    Zhang, Jing
    Du, Wenli
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4097 - 4115
  • [8] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [9] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    [J]. Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
  • [10] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024