Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

被引:396
|
作者
Mao, Qirong [1 ,2 ]
Dong, Ming [1 ]
Huang, Zhengwei [2 ]
Zhan, Yongzhao [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[2] Jiangsu Univ, Dept Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
关键词
Affective-salient discriminative feature analysis; convolutional neural networks; feature learning; speech emotion recognition;
D O I
10.1109/TMM.2014.2360798
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect-related, discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn affect-salient, discriminative features using a novel objective function that encourages feature saliency, orthogonality, and discrimination for SER. Our experimental results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e. g., with speaker and language variation, and environment distortion) and outperforms several well-established SER features.
引用
收藏
页码:2203 / 2213
页数:11
相关论文
共 50 条
  • [1] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [2] SPEECH EMOTION RECOGNITION USING QUATERNION CONVOLUTIONAL NEURAL NETWORKS
    Muppidi, Aneesh
    Radfar, Martin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6309 - 6313
  • [3] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [4] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    [J]. IEEE ACCESS, 2023, 11 : 116638 - 116649
  • [5] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    [J]. ELECTRONICS, 2023, 12 (20)
  • [6] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    [J]. 2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [7] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
  • [8] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    [J]. AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
  • [9] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59 (59)
  • [10] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    [J]. PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,