Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition

被引:0
|
作者
Pironkov, Gueorgui [1 ]
Dupont, Stephane [1 ]
Dutoit, Thierry [1 ]
机构
[1] Univ Mons, TCTS Lab, B-7000 Mons, Belgium
关键词
CONVOLUTIONAL NEURAL-NETWORKS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In order to address the commonly met issue of overfitting in speech recognition, this article investigates Multi-Task Learning, when the auxiliary task focuses on speaker classification. Overfitting occurs when the amount of training data is limited, leading to an over-sensible acoustic model. Multi-Task Learning is a method, among many other regularization methods, which decreases the overfitting impact by forcing the acoustic model to train jointly for multiple different, but related, tasks. In this paper, we consider speaker classification as an auxiliary task in order to improve the generalization abilities of the acoustic model, by training the model to recognize the speaker, or find the closest one inside the training set. We investigate this MultiTask Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
引用
收藏
页码:1911 / 1915
页数:5
相关论文
共 50 条
  • [1] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [2] Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
    Chen, Zhuo
    Watanabe, Shinji
    Erdogan, Hakan
    Hershey, John R.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3274 - 3278
  • [3] Speech emotion recognition based on convolutional neural network with attention-based bidirectional long short-term memory network and multi-task learning
    Liu, Zhen-Tao
    Han, Meng-Ting
    Wu, Bao-Han
    Rehman, Abdul
    [J]. APPLIED ACOUSTICS, 2023, 202
  • [4] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090
  • [5] Multi-task Recurrent Model for Speech and Speaker Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [6] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [7] Multi-Task Learning and Attention Mechanism Based Long Short-Term Memory for Temperature Prediction of EMU Bearing
    Chen, Yaohua
    Zhang, Chun
    Zhang, Ning
    Chen, Yiting
    Wang, Huan
    [J]. 2019 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-QINGDAO), 2019,
  • [8] Long short-term memory for speaker generalization in supervised speech separation
    [J]. 1600, Acoustical Society of America (141):
  • [9] Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation
    Chen, Jitong
    Wang, DeLiang
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3314 - 3318
  • [10] Long short-term memory for speaker generalization in supervised speech separation
    Chen, Jitong
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4705 - 4714