Towards multi-task learning of speech and speaker recognition

被引:0
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
关键词
multi-task learning; speech recognition; speaker recognition; wav2vec2;
D O I
10.21437/Interspeech.2023-353
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.
引用
收藏
页码:4898 / 4902
页数:5
相关论文
共 50 条
  • [21] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [22] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [23] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [24] Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
    Shao, Qijie
    Guo, Pengcheng
    Yan, Jinghao
    Hu, Pengfei
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 459 - 470
  • [25] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [26] TO REVERSE THE GRADIENT OR NOT: AN EMPIRICAL COMPARISON OF ADVERSARIAL AND MULTI-TASK LEARNING IN SPEECH RECOGNITION
    Adi, Yossi
    Zeghidour, Neil
    Collobert, Ronan
    Usunier, Nicolas
    Liptchinsky, Vitaliy
    Synnaeve, Gabriel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3742 - 3746
  • [27] A multi-task network for speaker and command recognition in industrial environments
    Bini, Stefano
    Percannella, Gennaro
    Saggese, Alessia
    Vento, Mario
    PATTERN RECOGNITION LETTERS, 2023, 176 : 62 - 68
  • [28] A Multi-task Framework of Speaker Recognition with TTS Data Augmentation
    Xie, Xingjia
    Zhi, Yiming
    Ouyang, Beibei
    Hong, Qingyang
    Li, Lin
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 210 - 215
  • [29] MULTI-TASK DISTILLATION: TOWARDS MITIGATING THE NEGATIVE TRANSFER IN MULTI-TASK LEARNING
    Meng, Ze
    Yao, Xin
    Sun, Lifeng
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 389 - 393
  • [30] Multi-task Learning over Mixup Variants for the Speaker Verification Task
    Fathan, Abderrahim
    Alam, Jahangir
    Zhu, Xiaolin
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 446 - 460