LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS

被引:0
|
作者
Swietojanski, Pawel [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Speaker Adaptation; Deep Neural Networks; TED; IWSLT; LHUC; SPEECH; TRANSFORMATIONS; FEATURES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a simple yet effective model-based neural network speaker adaptation technique that learns speaker-specific hidden unit contributions given adaptation data, without requiring any form of speaker-adaptive training, or labelled adaptation data. An additional amplitude parameter is defined for each hidden unit; the amplitude parameters are tied for each speaker, and are learned using unsupervised adaptation. We conducted experiments on the TED talks data, as used in the International Workshop on Spoken Language Translation (IWSLT) evaluations. Our results indicate that the approach can reduce word error rates on standard IWSLT test sets by about 8-15% relative compared to unadapted systems, with a further reduction of 4-6% relative when combined with feature-space maximum likelihood linear regression (fMLLR). The approach can be employed in most existing feed-forward neural network architectures, and we report results using various hidden unit activation functions: sigmoid, maxout, and rectifying linear units (ReLU).
引用
收藏
页码:171 / 176
页数:6
相关论文
共 50 条
  • [1] Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
    Swietojanski, Pawel
    Li, Jinyu
    Renals, Steve
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (08) : 1450 - 1463
  • [2] BLHUC: BAYESIAN LEARNING OF HIDDEN UNIT CONTRIBUTIONS FOR DEEP NEURAL NETWORK SPEAKER ADAPTATION
    Xie, Xurong
    Liu, Xunying
    Lee, Tan
    Hui, Shoukang
    Wang, Lan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5711 - 5715
  • [3] Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features
    Xie, Xurong
    Liu, Xunying
    Lee, Tan
    Wang, Lan
    [J]. INTERSPEECH 2019, 2019, : 759 - 763
  • [4] Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models
    Yi, Jiangyan
    Tao, Jianhua
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 176 - 180
  • [5] Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
    Saon, George
    Soltau, Hagen
    Nahamoo, David
    Picheny, Michael
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 55 - 59
  • [6] UNSUPERVISED SPEAKER ADAPTATION OF BATCH NORMALIZED ACOUSTIC MODELS FOR ROBUST ASR
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4890 - 4894
  • [7] Improved Bayesian learning of hidden Markov models for speaker adaptation
    Chien, JT
    Lee, CH
    Wang, HC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1027 - 1030
  • [8] DISCRIMINATIVELY TRAINED JOINT SPEAKER AND ENVIRONMENT REPRESENTATIONS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS
    Yin, Maofan
    Sivadas, Sunil
    Yu, Kai
    Ma, Bin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5065 - 5069
  • [9] UNSUPERVISED SPEAKER ADAPTATION USING ALL-PHONEME ERGODIC HIDDEN MARKOV NETWORK
    MIYAZAWA, Y
    TAKAMI, J
    SAGAYAMA, S
    MATSUNAGA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (08) : 1044 - 1050
  • [10] Unsupervised Adaptation of Recurrent Neural Network Language Models
    Gangireddy, Siva Reddy
    Swietojanski, Pawel
    Bell, Peter
    Renals, Steve
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2333 - 2337