AESR: Speech Recognition With Speech Emotion Recogniting Learning

被引:0
|
作者
Han, RongQi [1 ]
Liu, Xin [1 ]
Zhang, Hui [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
关键词
Automatic Speech Recognition; Speech Emotion Recognition; Multi-task Learning; Character Error Rate; Word Error Rate;
D O I
10.1007/978-981-96-1045-7_8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern Automatic Speech Recognition (ASR) systems aim to accurately convert spoken language into written text. However, they often face challenges when confronted with emotional speech, as traditional systems struggle to interpret the subtleties of emotional inflection. To overcome this challenge, a multi-task learning approach has been proposed that simultaneously addresses ASR and Speech Emotion Recognition (SER). With limited emotional speech resources, this approach has demonstrated improved recognition accuracy for the streaming ASR system when handling emotional utterances. Experiments conducted on both the MELD and SIMS datasets have shown a significant decrease in Word Error Rate (WER) and Character Error Rate(CER) when using the joint learning method compared to the optimized baseline. Specifically, the WER decreased by 1.27 on the MELD dataset and the CER by 0.58 on the SIMS dataset.
引用
收藏
页码:91 / 101
页数:11
相关论文
共 50 条
  • [31] On speech emotion recognition system in E-learning
    Yin Chunyong
    Sun Ruxia
    Luo Qi
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 4, 2007, : 472 - +
  • [32] Articulation constrained learning with application to speech emotion recognition
    Shah, Mohit
    Tu, Ming
    Berisha, Visar
    Chakrabarti, Chaitali
    Spanias, Andreas
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [33] Fidgety Speech Emotion Recognition for Learning Process Modeling
    Zhu, Ming
    Wang, Chunchieh
    Huang, Chengwei
    ELECTRONICS, 2024, 13 (01)
  • [34] Towards Discriminative Representation Learning for Speech Emotion Recognition
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Bu, Yaohua
    Zhao, Sheng
    Meng, Helen
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5060 - 5066
  • [35] Double sparse learning model for speech emotion recognition
    Zong, Yuan
    Zheng, Wenming
    Cui, Zhen
    Li, Qiang
    ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
  • [36] Articulation constrained learning with application to speech emotion recognition
    Mohit Shah
    Ming Tu
    Visar Berisha
    Chaitali Chakrabarti
    Andreas Spanias
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [37] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [38] Vector learning representation for generalized speech emotion recognition
    Singkul, Sattaya
    Woraratpanya, Kuntpong
    HELIYON, 2022, 8 (03)
  • [39] Multiview Supervised Dictionary Learning in Speech Emotion Recognition
    Gangeh, Mehrdad J.
    Fewzee, Pouria
    Ghodsi, Ali
    Kamel, Mohamed S.
    Karray, Fakhri
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (06) : 1056 - 1068
  • [40] SPEECH EMOTION RECOGNITION-A DEEP LEARNING APPROACH
    Asiya, U. A.
    Kiran, V. K.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 867 - 871