AESR: Speech Recognition With Speech Emotion Recogniting Learning

被引:0
|
作者
Han, RongQi [1 ]
Liu, Xin [1 ]
Zhang, Hui [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
关键词
Automatic Speech Recognition; Speech Emotion Recognition; Multi-task Learning; Character Error Rate; Word Error Rate;
D O I
10.1007/978-981-96-1045-7_8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern Automatic Speech Recognition (ASR) systems aim to accurately convert spoken language into written text. However, they often face challenges when confronted with emotional speech, as traditional systems struggle to interpret the subtleties of emotional inflection. To overcome this challenge, a multi-task learning approach has been proposed that simultaneously addresses ASR and Speech Emotion Recognition (SER). With limited emotional speech resources, this approach has demonstrated improved recognition accuracy for the streaming ASR system when handling emotional utterances. Experiments conducted on both the MELD and SIMS datasets have shown a significant decrease in Word Error Rate (WER) and Character Error Rate(CER) when using the joint learning method compared to the optimized baseline. Specifically, the WER decreased by 1.27 on the MELD dataset and the CER by 0.58 on the SIMS dataset.
引用
收藏
页码:91 / 101
页数:11
相关论文
共 50 条
  • [41] Survey of Deep Representation Learning for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Qadir, Junaid
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1634 - 1654
  • [42] Emotion Recognition On Speech Signals Using Machine Learning
    Ghai, Mohan
    Lal, Shamit
    Duggal, Shivam
    Manik, Shrey
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 34 - 39
  • [43] Learning Alignment for Multimodal Emotion Recognition from Speech
    Xu, Haiyang
    Zhang, Hui
    Han, Kun
    Wang, Yun
    Peng, Yiping
    Li, Xiangang
    INTERSPEECH 2019, 2019, : 3569 - 3573
  • [44] Emotion Recognition in Speech with Latent Discriminative Representations Learning
    Han, Jing
    Zhang, Zixing
    Keren, Gil
    Schuller, Bjorn
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) : 737 - 740
  • [45] Speech based Emotion Recognition using Machine Learning
    Deshmukh, Girija
    Gaonkar, Apurva
    Golwalkar, Gauri
    Kulkarni, Sukanya
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 812 - 817
  • [46] Evaluating deep learning architectures for Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    NEURAL NETWORKS, 2017, 92 : 60 - 68
  • [47] Lightweight Deep Learning Framework for Speech Emotion Recognition
    Akinpelu, Samson
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2023, 11 : 77086 - 77098
  • [48] Applying Machine Learning Techniques for Speech Emotion Recognition
    Tarunika, K.
    Pradeeba, R. B.
    Aruna, P.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [49] Emotion Recognition from Speech: An Unsupervised Learning Approach
    Rovetta, Stefano
    Mnasri, Zied
    Masulli, Francesco
    Cabri, Alberto
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 23 - 35
  • [50] Deep Learning Techniques for Speech Emotion Recognition : A Review
    Pandey, Sandeep Kumar
    Shekhawat, H. S.
    Prasanna, S. R. M.
    2019 29TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2019, : 197 - 202