Robust emotion recognition by spectro-temporal modulation statistic features

被引:0
|
作者
Tai-Shih Chi
Lan-Ying Yeh
Chin-Cheng Hsu
机构
[1] National Chiao Tung University,Department of Electrical Engineering
关键词
Robust emotion recognition; Spectro-temporal modulation;
D O I
暂无
中图分类号
学科分类号
摘要
Most speech emotion recognition studies consider clean speech. In this study, statistics of joint spectro-temporal modulation features are extracted from an auditory perceptual model and are used to detect the emotion status of speech under noisy conditions. Speech samples were extracted from the Berlin Emotional Speech database and corrupted with white and babble noise under various SNR levels. This study investigates a clean train/noisy test scenario to simulate practical conditions with unknown noisy sources. Simulations demonstrate the redundancy of the proposed spectro-temporal modulation features and further consider the dimensionality reduction. The proposed modulation features achieve higher recognition rates of speech emotions under noisy conditions than (1) conventional mel-frequency cepstral coefficients combined with prosodic features; (2) official acoustic features adopted in the INTERSPEECH 2009 Emotion Challenge. Adding modulation features increased the recognition rates of INTERSPEECH proposed features by approximately 7% for all tested SNR conditions (20–0 dB).
引用
收藏
页码:47 / 60
页数:13
相关论文
共 50 条
  • [11] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
    Saleem, Nasir
    Gao, Jiechao
    Irfan, Rizwana
    Almadhor, Ahmad
    Rauf, Hafiz Tayyab
    Zhang, Yudong
    Kadry, Seifedine
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417
  • [12] Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models
    Mallidi, Sri Harish
    Ganapathy, Sriram
    Hermansky, Hynek
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3656 - 3660
  • [13] Spectro-temporal Power Spectrum Features for Noise Robust ASR
    Hamed Riazati Seresht
    Seyed Mohammad Ahadi
    Sanaz Seyedin
    Circuits, Systems, and Signal Processing, 2017, 36 : 3222 - 3242
  • [14] Spectro-temporal Power Spectrum Features for Noise Robust ASR
    Seresht, Hamed Riazati
    Ahadi, Seyed Mohammad
    Seyedin, Sanaz
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2017, 36 (08) : 3222 - 3242
  • [15] Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
    Geng, Mengzhe
    Liu, Shansong
    Yu, Jianwei
    Xie, Xurong
    Hu, Shoukang
    Ye, Zi
    Jin, Zengrui
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4793 - 4797
  • [16] Spectro-Temporal Directional Derivative Features for Automatic Speech Recognition
    Gibson, James
    Van Segbroeck, Maarten
    Ortega, Antonio
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 872 - 875
  • [17] Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition
    Kovacs, Gyorgy
    Toth, Laszlo
    ACTA CYBERNETICA, 2015, 22 (01): : 117 - 134
  • [18] Spectro-temporal modulation energy based mask for robust speaker identification
    Chi, Tai-Shih
    Lin, Ting-Han
    Hsu, Chung-Chien
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (05): : EL368 - EL374
  • [19] Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
    Schaedler, Marc Rene
    Kollmeier, Birger
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (04): : 2047 - 2059
  • [20] Exponential spectro-temporal modulation generation
    Stavropoulos, Trevor A.
    Isarangura, Sittiprapa
    Hoover, Eric C.
    Eddins, David A.
    Seitz, Aaron R.
    Gallun, Frederick J.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (03): : 1434 - 1443