Multi-task Recurrent Model for Speech and Speaker Recognition

被引:0
|
作者
Tang, Zhiyuan [1 ,2 ]
Li, Lantian [1 ]
Wang, Dong [1 ]
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Res Inst Informat Technol, Div Tech Innovat & Dev,Tsinghua Natl Lab Informat, Beijing, Peoples R China
[2] Chinese Acad Sci, Chengdu Inst Comp Applicat, Beijing 100864, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. The model is based on a unified neural network where the output of one task is fed to the input of the other, leading to a multi-task recurrent network. Experiments show that the joint model outperforms the task-specific models on both the two tasks.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Multi-task Recurrent Model for True Multilingual Speech Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [3] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Kalhor, Elham
    Bakhtiari, Behzad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8127 - 8146
  • [4] Speaker independent feature selection for speech emotion recognition: A multi-task approach
    Elham Kalhor
    Behzad Bakhtiari
    [J]. Multimedia Tools and Applications, 2021, 80 : 8127 - 8146
  • [5] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    [J]. INTERSPEECH 2021, 2021, : 4508 - 4512
  • [6] Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model
    Guo, Ying
    Wang, Li
    [J]. IAENG International Journal of Computer Science, 2025, 52 (01) : 23 - 31
  • [7] Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network
    Duc Le
    Aldeneh, Zakaria
    Provost, Emily Mower
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1108 - 1112
  • [8] A multi-task network for speaker and command recognition in industrial environments
    Bini, Stefano
    Percannella, Gennaro
    Saggese, Alessia
    Vento, Mario
    [J]. PATTERN RECOGNITION LETTERS, 2023, 176 : 62 - 68
  • [9] A Multi-task Framework of Speaker Recognition with TTS Data Augmentation
    Xie, Xingjia
    Zhi, Yiming
    Ouyang, Beibei
    Hong, Qingyang
    Li, Lin
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 210 - 215
  • [10] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915