STUDY OF DENSE NETWORK APPROACHES FOR SPEECH EMOTION RECOGNITION

被引:0
|
作者
Abdelwahab, Mohammed [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA
关键词
Speech emotion recognition; Deep Neural Networks; NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system's performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial? How does the depth of the network affect the performance? We are collecting the MSP-Podcast corpus, a large dataset with over 30 hours of data, which provides an ideal resource to address these questions. This study explores various dense architectures to predict arousal, valence and dominance scores. We investigate varying the training set size, width, and depth of the network, as well as the activation functions used during training. We also study the effect of data augmentation on the network's performance. We find that bigger training set improves the performance. Batch normalization is crucial to achieving a good performance for deeper networks. We do not observe significant differences in the performance in residual networks compared to dense networks.
引用
收藏
页码:5084 / 5088
页数:5
相关论文
共 50 条
  • [1] Speech emotion recognition approaches: A systematic review
    Hashem, Ahlam
    Arif, Muhammad
    Alghamdi, Manal
    [J]. SPEECH COMMUNICATION, 2023, 154
  • [2] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [3] Emotion recognition from speech with StarGAN and Dense-DCNN
    Li, Lu-Qiao
    Xie, Kai
    Guo, Xiao-Long
    Wen, Chang
    He, Jian-Biao
    [J]. IET SIGNAL PROCESSING, 2022, 16 (01) : 62 - 79
  • [4] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [5] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    [J]. 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [6] Deep scattering network for speech emotion recognition
    Singh, Premjeet
    Saha, Goutam
    Sahidullah, Md
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 131 - 135
  • [7] Graph Isomorphism Network for Speech Emotion Recognition
    Liu, Jiawang
    Wang, Haoxiang
    [J]. INTERSPEECH 2021, 2021, : 3405 - 3409
  • [8] Speech emotion recognition approaches in human computer interaction
    S. Ramakrishnan
    Ibrahiem M. M. El Emary
    [J]. Telecommunication Systems, 2013, 52 : 1467 - 1478
  • [9] Speech emotion recognition approaches in human computer interaction
    Ramakrishnan, S.
    El Emary, Ibrahiem M. M.
    [J]. TELECOMMUNICATION SYSTEMS, 2013, 52 (03) : 1467 - 1478
  • [10] A systematic literature review of speech emotion recognition approaches
    Singh, Youddha Beer
    Goel, Shivani
    [J]. NEUROCOMPUTING, 2022, 492 : 245 - 263