SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:17
|
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
下载
收藏
页码:7342 / 7346
页数:5
相关论文
共 50 条
  • [1] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    IEEE ACCESS, 2022, 10 : 124396 - 124407
  • [2] SPEECH EMOTION RECOGNITION USING SELF-SUPERVISED FEATURES
    Morais, Edmilson
    Hoory, Ron
    Zhu, Weizhong
    Gat, Itai
    Damasceno, Matheus
    Aronowitz, Hagai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6922 - 6926
  • [3] Investigation of Ensemble of Self-Supervised Models for Speech Emotion Recognition
    Wu, Yanfeng
    Yue, Pengcheng
    Cheng, Cuiping
    Li, Taihao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 988 - 995
  • [4] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
    Chen, Sanyuan
    Wu, Yu
    Wang, Chengyi
    Liu, Shujie
    Chen, Zhuo
    Wang, Peidong
    Liu, Gang
    Li, Jinyu
    Wu, Jian
    Yu, Xiangzhan
    Wei, Furu
    INTERSPEECH 2022, 2022, : 3699 - 3703
  • [5] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
    Dissanayake, Vipula
    Seneviratne, Sachith
    Suriyaarachchi, Hussel
    Wen, Elliott
    Nanayakkara, Suranga
    INTERSPEECH 2022, 2022, : 3598 - 3602
  • [6] DEEP INVESTIGATION OF INTERMEDIATE REPRESENTATIONS IN SELF-SUPERVISED LEARNING MODELS FOR SPEECH EMOTION RECOGNITION
    Zhu, Zhi
    Sato, Yoshinao
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [7] Multi-corpus Affect Recognition with Emotion Embeddings and Self-Supervised Representations of Speech
    Alisamir, Sina
    Ringeval, Fabien
    Portet, Francois
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [8] PHONE AND SPEAKER SPATIAL ORGANIZATION IN SELF-SUPERVISED SPEECH REPRESENTATIONS
    Riera, Pablo
    Cerdeiro, Manuela
    Pepino, Leonardo
    Ferrer, Luciana
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [9] Self-Supervised ECG Representation Learning for Emotion Recognition
    Sarkar, Pritam
    Etemad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1541 - 1554
  • [10] Emotion recognition using semi-supervised feature selection with speaker normalization
    Sun Y.
    Wen G.
    International Journal of Speech Technology, 2015, 18 (3) : 317 - 331