AESR: Speech Recognition With Speech Emotion Recogniting Learning

被引:0
|
作者
Han, RongQi [1 ]
Liu, Xin [1 ]
Zhang, Hui [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
关键词
Automatic Speech Recognition; Speech Emotion Recognition; Multi-task Learning; Character Error Rate; Word Error Rate;
D O I
10.1007/978-981-96-1045-7_8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern Automatic Speech Recognition (ASR) systems aim to accurately convert spoken language into written text. However, they often face challenges when confronted with emotional speech, as traditional systems struggle to interpret the subtleties of emotional inflection. To overcome this challenge, a multi-task learning approach has been proposed that simultaneously addresses ASR and Speech Emotion Recognition (SER). With limited emotional speech resources, this approach has demonstrated improved recognition accuracy for the streaming ASR system when handling emotional utterances. Experiments conducted on both the MELD and SIMS datasets have shown a significant decrease in Word Error Rate (WER) and Character Error Rate(CER) when using the joint learning method compared to the optimized baseline. Specifically, the WER decreased by 1.27 on the MELD dataset and the CER by 0.58 on the SIMS dataset.
引用
收藏
页码:91 / 101
页数:11
相关论文
共 50 条
  • [11] Speech emotion recognition with unsupervised feature learning
    Zheng-wei HUANG
    Wen-tao XUE
    Qi-rong MAO
    FrontiersofInformationTechnology&ElectronicEngineering, 2015, 16 (05) : 358 - 366
  • [12] LEARNING WITH SYNTHESIZED SPEECH FOR AUTOMATIC EMOTION RECOGNITION
    Schuller, Bjoern
    Burkhardt, Felix
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5150 - 5153
  • [13] Speech emotion recognition with unsupervised feature learning
    Zheng-wei Huang
    Wen-tao Xue
    Qi-rong Mao
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 358 - 366
  • [14] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [15] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    INTERSPEECH 2020, 2020, : 4094 - 4097
  • [16] Speech Emotion Recognition Based on Learning Automata in
    Motamed, Sara
    Setayeshi, Saeed
    Farhoudi, Zeinab
    Ahmadi, Ali
    JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2014, 12 (03): : 173 - 185
  • [17] CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION
    Li, Mao
    Yang, Bo
    Levy, Joshua
    Stolcke, Andreas
    Rozgic, Viktor
    Matsoukas, Spyros
    Papayiannis, Constantinos
    Bone, Daniel
    Wang, Chao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6329 - 6333
  • [18] Learning Transferable Features for Speech Emotion Recognition
    Marczewski, Alison
    Veloso, Adriano
    Ziviani, Nivio
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 529 - 536
  • [19] Speech emotion recognition with unsupervised feature learning
    Huang, Zheng-wei
    Xue, Wen-tao
    Mao, Qi-rong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (05) : 358 - 366
  • [20] Speech emotion recognition via learning analogies
    Ntalampiras, Stavros
    PATTERN RECOGNITION LETTERS, 2021, 144 : 21 - 26