Robust emotion recognition by spectro-temporal modulation statistic features

被引:0
|
作者
Tai-Shih Chi
Lan-Ying Yeh
Chin-Cheng Hsu
机构
[1] National Chiao Tung University,Department of Electrical Engineering
关键词
Robust emotion recognition; Spectro-temporal modulation;
D O I
暂无
中图分类号
学科分类号
摘要
Most speech emotion recognition studies consider clean speech. In this study, statistics of joint spectro-temporal modulation features are extracted from an auditory perceptual model and are used to detect the emotion status of speech under noisy conditions. Speech samples were extracted from the Berlin Emotional Speech database and corrupted with white and babble noise under various SNR levels. This study investigates a clean train/noisy test scenario to simulate practical conditions with unknown noisy sources. Simulations demonstrate the redundancy of the proposed spectro-temporal modulation features and further consider the dimensionality reduction. The proposed modulation features achieve higher recognition rates of speech emotions under noisy conditions than (1) conventional mel-frequency cepstral coefficients combined with prosodic features; (2) official acoustic features adopted in the INTERSPEECH 2009 Emotion Challenge. Adding modulation features increased the recognition rates of INTERSPEECH proposed features by approximately 7% for all tested SNR conditions (20–0 dB).
引用
收藏
页码:47 / 60
页数:13
相关论文
共 50 条
  • [21] ROBUST SPECTRO-TEMPORAL FEATURES BASED ON AUTOREGRESSIVE MODELS OF HILBERT ENVELOPES
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4286 - 4289
  • [22] Spectro-temporal modulation detection in children
    Kirby, Benjamin J.
    Browning, Jenna M.
    Brennan, Marc A.
    Spratford, Meredith
    McCreery, Ryan W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (05): : EL465 - EL468
  • [23] Robust Dialect Identification System using Spectro-Temporal Gabor Features
    Chittaragi, Nagaratna B.
    Mothukuri, Siva Krishna P.
    Hegde, Pradyoth
    Koolagudi, Shashidhar G.
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 1589 - 1594
  • [24] POINT PROCESS MODELS OF SPECTRO-TEMPORAL MODULATION EVENTS FOR SPEECH RECOGNITION
    Jansen, Aren
    Mesgarani, Nima
    Niyogi, Partha
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 104 - 108
  • [25] Phase based spectro-temporal features for building a robust ASR system
    Dutta, Anirban
    Ashishkumar, G.
    Rao, Ch V. Rama
    INTERSPEECH 2020, 2020, : 1668 - 1672
  • [26] Spectro-Temporal Features for Robust Far-Field Speaker Identification
    Falk, Tiago H.
    Chan, Wai-Yip
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 634 - 637
  • [27] Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
    Schaedler, Marc Rene
    Kollmeier, Birger
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1810 - 1813
  • [28] Development of spectro-temporal features of speech in children
    Gautam S.
    Singh L.
    Gautam, Sumanlata (suman.gautam82@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 543 - 551
  • [29] Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1006 - 1019
  • [30] Robust Spectro-Temporal Speech Features with Model-Based Distribution Equalization
    Ngouoko, Samuel K. M.
    Heckmann, Martin
    Wrede, Britta
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,