Robust Speech Emotion Recognition under Different Encoding Conditions

被引:4
|
作者
Oates, Christopher [1 ]
Triantafyllopoulos, Andreas [1 ]
Steiner, Ingmar [1 ]
Schuller, Bjoern [1 ,2 ,3 ]
机构
[1] audEERING GmbH, Gilching, Germany
[2] Univ Augsburg, ZDB Chair Embedded Intelligence Hlth Care & Wellb, Augsburg, Germany
[3] Imperial Coll London, GLAM Grp Language Audio & Mus, London, England
来源
关键词
speech emotion recognition; speech and audio compression acronym;
D O I
10.21437/Interspeech.2019-1658
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In an era where large speech corpora annotated for emotion are hard to come by, and especially ones where emotion is expressed freely instead of being acted, the importance of using free online sources for collecting such data cannot be overstated. Most of those sources, however, contain encoded audio due to storage and bandwidth constraints, often in very low bitrates. In addition, with the increased industry interest on voice-based applications, it is inevitable that speech emotion recognition (SER) algorithms will soon find their way into production environments, where the audio might be encoded in a different bitrate than the one available during training. Our contribution is threefold. First, we show that encoded audio still contains enough relevant information for robust SER. Next, we investigate the effects of mismatched encoding conditions in the training and test set both for traditional machine learning algorithms built on hand-crafted features and modern end-to-end methods. Finally, we investigate the robustness of those algorithms in the multi-condition scenario, where the training set is augmented with encoded audio, but still differs from the training set. Our results indicate that end-to-end methods are more robust even in the more challenging scenario of mismatched conditions.
引用
收藏
页码:3935 / 3939
页数:5
相关论文
共 50 条
  • [1] Application of prosody modification for Speech Recognition in different Emotion conditions
    Raju, V. V. Vidyadhara
    Gangamohan, P.
    Gangashetty, Suryakanth V.
    Vuppala, Anil Kumar
    [J]. PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 951 - 954
  • [2] Robust recognition of emotion from speech
    Hoque, Mohammed E.
    Yeasin, Mohammed
    Louwerse, Max M.
    [J]. INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
  • [3] Towards Robust Speech-Based Emotion Recognition
    Tabatabaei, Talieh S.
    Krishnan, Sridhar
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [4] Speech Emotion Recognition under White Noise
    Huang, Chengwei
    Chen, Guoming
    Yu, Hua
    Bao, Yongqiang
    Zhao, Li
    [J]. ARCHIVES OF ACOUSTICS, 2013, 38 (04) : 457 - 463
  • [5] Robust Speech Recognition for Similar Japanese Pronunciation Phrases Under Noisy Conditions
    Mufungulwa, George
    Tsutsui, Hiroshi
    Miyanaga, Yoshikazu
    Abe, Shin-ichi
    Ochi, Mitsuru
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2017,
  • [6] Emotion Recognition from Speech under Environmental Noise Conditions using Wavelet Decomposition
    Vasquez-Correa, J. C.
    Garcia, N.
    Orozco-Arroyave, J. R.
    Arias-Londono, J. D.
    Vargas-Bonilla, J. F.
    Noeth, Elmar
    [J]. 49TH ANNUAL IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2015, : 247 - 252
  • [7] From Simulated Speech to Natural Speech, What are the Robust Features for Emotion Recognition?
    Li, Ya
    Chao, Linlin
    Liu, Yazhu
    Bao, Wei
    Tao, Jianhua
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 368 - 373
  • [8] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [9] Building Robust Emotion Recognition System on Heterogeneous Speech Databases
    Yoon, Won-Jung
    Park, Kyu-Sik
    [J]. IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), 2011, : 825 - 826
  • [10] Robust emotion recognition in noisy speech via sparse representation
    Zhao, Xiaoming
    Zhang, Shiqing
    Lei, Bicheng
    [J]. NEURAL COMPUTING & APPLICATIONS, 2014, 24 (7-8): : 1539 - 1553