Towards multi-task learning of speech and speaker recognition

被引：0

作者：

Vaessen, Nik ^{[1
]}

van Leeuwen, David A. ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands

来源：

INTERSPEECH 2023 | 2023年

关键词：

multi-task learning; speech recognition; speaker recognition; wav2vec2;

D O I：

10.21437/Interspeech.2023-353

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.

引用

页码：4898 / 4902

页数：5

共 50 条

[1] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
[2] Multi-task Recurrent Model for Speech and Speaker Recognition
Tang, Zhiyuan
Li, Lantian
Wang, Dong
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[3] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
INTERSPEECH 2021, 2021, : 4508 - 4512
[4] Meta Multi-task Learning for Speech Emotion Recognition
Cai, Ruichu
Guo, Kaibin
Xu, Boyan
Yang, Xiaoyan
Zhang, Zhenjie
INTERSPEECH 2020, 2020, : 3336 - 3340
[5] Speech Emotion Recognition based on Multi-Task Learning
Zhao, Huijuan
Han Zhijie
Wang, Ruchuan
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
[6] Speaker independent feature selection for speech emotion recognition: A multi-task approach
Kalhor, Elham
Bakhtiari, Behzad
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
[7] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915
[8] Speaker independent feature selection for speech emotion recognition: A multi-task approach
Elham Kalhor
Behzad Bakhtiari
Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
[9] Multi-task learning for X-vector based speaker recognition
Zhang Y.
Liu L.
International Journal of Speech Technology, 2023, 26 (04) : 817 - 823
[10] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
Song, Minguang
Zhao, Yunxin
Wang, Shaojun
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203

← 1 2 3 4 5 →