AESR: Speech Recognition With Speech Emotion Recogniting Learning

被引:0
|
作者
Han, RongQi [1 ]
Liu, Xin [1 ]
Zhang, Hui [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
关键词
Automatic Speech Recognition; Speech Emotion Recognition; Multi-task Learning; Character Error Rate; Word Error Rate;
D O I
10.1007/978-981-96-1045-7_8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern Automatic Speech Recognition (ASR) systems aim to accurately convert spoken language into written text. However, they often face challenges when confronted with emotional speech, as traditional systems struggle to interpret the subtleties of emotional inflection. To overcome this challenge, a multi-task learning approach has been proposed that simultaneously addresses ASR and Speech Emotion Recognition (SER). With limited emotional speech resources, this approach has demonstrated improved recognition accuracy for the streaming ASR system when handling emotional utterances. Experiments conducted on both the MELD and SIMS datasets have shown a significant decrease in Word Error Rate (WER) and Character Error Rate(CER) when using the joint learning method compared to the optimized baseline. Specifically, the WER decreased by 1.27 on the MELD dataset and the CER by 0.58 on the SIMS dataset.
引用
收藏
页码:91 / 101
页数:11
相关论文
共 50 条
  • [1] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [2] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [3] Transfer Learning for Speech Emotion Recognition
    Han Zhijie
    Zhao, Huijuan
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 96 - 99
  • [4] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
    Xu, Xinzhou
    Deng, Jun
    Cummins, Nicholas
    Zhang, Zixing
    Zhao, Li
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 949 - 953
  • [5] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
    Neumann, Michael
    Ngoc Thang Vu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394
  • [6] Transfer Learning of Large Speech Models for Italian Speech Emotion Recognition
    D'Asaro, Federico
    Villacis, Juan Jose Marquez
    Rizzo, Giuseppe
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [7] SPEECH EMOTION RECOGNITION WITH ENSEMBLE LEARNING METHODS
    Shih, Po-Yuan
    Chen, Chia-Ping
    Wu, Chung-Hsien
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2756 - 2760
  • [8] Machine Learning Approach for Emotion Recognition in Speech
    Gjoreski, Martin
    Gjoreski, Hristijan
    Kulakov, Andrea
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
  • [9] Federated Learning for Speech Emotion Recognition Applications
    Latif, Siddique
    Khalifa, Sara
    Rana, Rajib
    Jurdak, Raja
    2020 19TH ACM/IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN 2020), 2020, : 341 - 342
  • [10] Learning Spontaneity to Improve Emotion Recognition in Speech
    Mangalam, Karttikeya
    Guha, Tanaya
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 946 - 950