Robust emotion recognition by spectro-temporal modulation statistic features

被引:0
|
作者
Tai-Shih Chi
Lan-Ying Yeh
Chin-Cheng Hsu
机构
[1] National Chiao Tung University,Department of Electrical Engineering
关键词
Robust emotion recognition; Spectro-temporal modulation;
D O I
暂无
中图分类号
学科分类号
摘要
Most speech emotion recognition studies consider clean speech. In this study, statistics of joint spectro-temporal modulation features are extracted from an auditory perceptual model and are used to detect the emotion status of speech under noisy conditions. Speech samples were extracted from the Berlin Emotional Speech database and corrupted with white and babble noise under various SNR levels. This study investigates a clean train/noisy test scenario to simulate practical conditions with unknown noisy sources. Simulations demonstrate the redundancy of the proposed spectro-temporal modulation features and further consider the dimensionality reduction. The proposed modulation features achieve higher recognition rates of speech emotions under noisy conditions than (1) conventional mel-frequency cepstral coefficients combined with prosodic features; (2) official acoustic features adopted in the INTERSPEECH 2009 Emotion Challenge. Adding modulation features increased the recognition rates of INTERSPEECH proposed features by approximately 7% for all tested SNR conditions (20–0 dB).
引用
下载
收藏
页码:47 / 60
页数:13
相关论文
共 50 条
  • [41] Spectro-temporal modulation glimpsing for speech intelligibility prediction
    Edraki, Amin
    Chan, Wai-Yip
    Jensen, Jesper
    Fogerty, Daniel
    HEARING RESEARCH, 2022, 426
  • [42] Spectro-temporal modulation transfer functions and speech intelligibility
    Chi, TS
    Gao, YJ
    Guyton, MC
    Ru, PW
    Shamma, S
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05): : 2719 - 2732
  • [43] A Closer Look on Hierarchical Spectro-Temporal Features (HIST)
    Heckmann, Martin
    Domont, Xavier
    Joublin, Frank
    Goerick, Christian
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 894 - 897
  • [44] Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases
    Kailash Patil
    Mounya Elhilali
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [45] Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
    Meyer, Bernd T.
    Kollmeier, Birger
    SPEECH COMMUNICATION, 2011, 53 (05) : 753 - 767
  • [46] Comparing Different Flavors of Spectro-Temporal Features for ASR
    Meyer, Bernd T.
    Ravuri, Suman V.
    Schaedler, Marc Rene
    Morgan, Nelson
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1276 - +
  • [47] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [48] Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
    Fukuda, Takashi
    Ichikawa, Osamu
    Nishimura, Masafumi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 236 - 239
  • [49] Robust Audio Identification Using Spectro-Temporal Subband Centroids
    Seo, Jin Soo
    Lee, Seungjae
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (05): : 239 - 243