SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:17
|
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
下载
收藏
页码:7342 / 7346
页数:5
相关论文
共 50 条
  • [21] SELF-SUPERVISED LEARNING FOR ECG-BASED EMOTION RECOGNITION
    Sarkar, Pritam
    Etemad, Ali
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3217 - 3221
  • [22] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [23] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435
  • [24] Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts
    Hyeon, Jonghwan
    Oh, Yung-Hwan
    Lee, Young-Jun
    Choi, Ho-Jin
    DATA & KNOWLEDGE ENGINEERING, 2024, 150
  • [25] Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks
    Goncalves, Lucas
    Busso, Carlos
    INTERSPEECH 2022, 2022, : 1168 - 1172
  • [26] ON THE USE OF SELF-SUPERVISED PRE-TRAINED ACOUSTIC AND LINGUISTIC FEATURES FOR CONTINUOUS SPEECH EMOTION RECOGNITION
    Macary, Manon
    Tahon, Marie
    Esteve, Yannick
    Rousseau, Anthony
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 373 - 380
  • [27] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [28] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
    Violeta, Lester Phillip
    Huang, Wen-Chin
    Toda, Tomoki
    INTERSPEECH 2022, 2022, : 41 - 45
  • [30] Robust Self-Supervised Audio-Visual Speech Recognition
    Shi, Bowen
    Hsu, Wei-Ning
    Mohamed, Abdelrahman
    INTERSPEECH 2022, 2022, : 2118 - 2122