SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:17
|
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
下载
收藏
页码:7342 / 7346
页数:5
相关论文
共 50 条
  • [41] Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
    Liu, Rui
    Ma, Zening
    arXiv,
  • [42] Self-Supervised Learning with Random-Projection Quantizer for Speech Recognition
    Chiu, Chung-Cheng
    Qin, James
    Zhang, Yu
    Yu, Jiahui
    Wu, Yonghui
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition
    Hu, Shujie
    Xie, Xurong
    Geng, Mengzhe
    Jin, Zengrui
    Deng, Jiajun
    Li, Guinan
    Wang, Yi
    Cui, Mingyu
    Wang, Tianzi
    Meng, Helen
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3561 - 3575
  • [44] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [45] Prototype Division for Self-Supervised Speaker Verification
    Zhao, Zhenduo
    Li, Zhuo
    Zhang, Xueshuai
    Wang, Wenchao
    Zhang, Pengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 880 - 884
  • [46] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [47] Self-Supervised Speech Enhancement for Arabic Speech Recognition in Real-World Environments
    Dendani, Bilal
    Bahi, Halima
    Sari, Toufik
    TRAITEMENT DU SIGNAL, 2021, 38 (02) : 349 - 358
  • [48] IMPROVING SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION
    Wang, Chengyi
    Wu, Yu
    Chen, Sanyuan
    Liu, Shujie
    Li, Jinyu
    Qian, Yao
    Yang, Zhenglu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7092 - 7096
  • [49] Consistency self-supervised learning method for robust automatic speech recognition
    Gao, Changfeng
    Cheng, Gaofeng
    Zhang, Pengyuan
    Shengxue Xuebao/Acta Acustica, 2023, 48 (03): : 578 - 587
  • [50] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
    Chen, Zhengyang
    Chen, Sanyuan
    Wu, Yu
    Qian, Yao
    Wang, Chengyi
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151