Multi-task Recurrent Model for Speech and Speaker Recognition

被引：0

作者：

Tang, Zhiyuan ^{[1
,2
]}

Li, Lantian ^{[1
]}

Wang, Dong ^{[1
]}

机构：

[1] Tsinghua Univ, Ctr Speech & Language Technol, Res Inst Informat Technol, Div Tech Innovat & Dev,Tsinghua Natl Lab Informat, Beijing, Peoples R China

[2] Chinese Acad Sci, Chengdu Inst Comp Applicat, Beijing 100864, Peoples R China

来源：

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2016年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. The model is based on a unified neural network where the output of one task is fed to the input of the other, leading to a multi-task recurrent network. Experiments show that the joint model outperforms the task-specific models on both the two tasks.

引用

页数：4

共 50 条

[1] Multi-task Recurrent Model for True Multilingual Speech Recognition
Tang, Zhiyuan
Li, Lantian
Wang, Dong
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[2] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
[J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
[3] Speaker independent feature selection for speech emotion recognition: A multi-task approach
Kalhor, Elham
Bakhtiari, Behzad
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
[4] Speaker independent feature selection for speech emotion recognition: A multi-task approach
Elham Kalhor
Behzad Bakhtiari
[J]. Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
[5] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
[J]. INTERSPEECH 2021, 2021, : 4508 - 4512
[6] Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model
Guo, Ying
Wang, Li
[J]. IAENG International Journal of Computer Science, 2025, 52 (01) : 23 - 31
[7] Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network
Duc Le
Aldeneh, Zakaria
Provost, Emily Mower
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1108 - 1112
[8] A multi-task network for speaker and command recognition in industrial environments
Bini, Stefano
Percannella, Gennaro
Saggese, Alessia
Vento, Mario
[J]. PATTERN RECOGNITION LETTERS, 2023, 176 : 62 - 68
[9] A Multi-task Framework of Speaker Recognition with TTS Data Augmentation
Xie, Xingjia
Zhi, Yiming
Ouyang, Beibei
Hong, Qingyang
Li, Lin
[J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 210 - 215
[10] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
[J]. 2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915

← 1 2 3 4 5 →