Deep scattering network for speech emotion recognition

被引:0
|
作者
Singh, Premjeet [1 ]
Saha, Goutam [1 ]
Sahidullah, Md [2 ]
机构
[1] Indian Inst Technol Kharagpur, Dept Elect & ECE, Kharagpur, W Bengal, India
[2] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
关键词
Deep convolutional networks; Deep scattering transform; EmoDB; IEMOCAP; RAVDESS; Shift invariance; Speech emotion recognition; FEATURES;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces scattering transform for speech emotion recognition (SER). Scattering transform generates feature representations which remain stable to deformations and shifting in time and frequency without much loss of information. In speech, the emotion cues are spread across time and localised in frequency. The time and frequency invariance characteristic of scattering coefficients provides a representation robust against emotion irrelevant variations e.g., different speakers, language, gender etc. while preserving the variations caused by emotion cues. Hence, such a representation captures the emotion information more efficiently from speech. We perform experiments to compare scattering coefficients with standard mel-frequency cepstral coefficients (MFCCs) over different databases. It is observed that frequency scattering performs better than time-domain scattering and MFCCs. We also investigate layer-wise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER. We observe that layer-wise coefficients taken independently also perform better than MFCCs.
引用
收藏
页码:131 / 135
页数:5
相关论文
共 50 条
  • [1] Speech Emotion Recognition Based on Deep Belief Network
    Shi, Peng
    [J]. 2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [2] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [3] Speech Emotion Recognition Based on Deep Residual Shrinkage Network
    Han, Tian
    Zhang, Zhu
    Ren, Mingyuan
    Dong, Changchun
    Jiang, Xiaolin
    Zhuang, Quansheng
    [J]. ELECTRONICS, 2023, 12 (11)
  • [4] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [5] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    [J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [6] Performance Evaluation of Deep Autoencoder Network for Speech Emotion Recognition
    AndleebSiddiqui, Maria
    Hussain, Wajahat
    Ali, Syed Abbas
    Danish-ur-Rehman
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 606 - 611
  • [7] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [8] Tamil Speech Emotion Recognition Using Deep Belief Network(DBN)
    Srikanth, M.
    Pravena, D.
    Govind, D.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2018, 678 : 328 - 336
  • [9] Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network
    Liu, Dong
    Chen, Longxi
    Wang, Zhiyong
    Diao, Guangqiang
    [J]. JOURNAL OF GRID COMPUTING, 2021, 19 (02)
  • [10] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    [J]. 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129