Towards multi-task learning of speech and speaker recognition

被引:0
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
关键词
multi-task learning; speech recognition; speaker recognition; wav2vec2;
D O I
10.21437/Interspeech.2023-353
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.
引用
收藏
页码:4898 / 4902
页数:5
相关论文
共 50 条
  • [1] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [2] Multi-task Recurrent Model for Speech and Speaker Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [3] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [4] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [5] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [6] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Kalhor, Elham
    Bakhtiari, Behzad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
  • [7] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915
  • [8] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Elham Kalhor
    Behzad Bakhtiari
    Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
  • [9] Multi-task learning for X-vector based speaker recognition
    Zhang Y.
    Liu L.
    International Journal of Speech Technology, 2023, 26 (04) : 817 - 823
  • [10] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203