UNSUPERVISED SPEAKER ADAPTATION OF BATCH NORMALIZED ACOUSTIC MODELS FOR ROBUST ASR

被引:0
|
作者
Wang, Zhong-Qiu [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
robust ASR; deep neural networks; batch normalization; unsupervised speaker adaptation; CHiME-3; SPEECH RECOGNITION; SEPARATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Batch normalization is a standard technique for training deep neural networks. In batch normalization, the input of each hidden layer is first mean-variance normalized and then linearly transformed before applying non-linear activation functions. We propose a novel unsupervised speaker adaptation technique for batch normalized acoustic models. The key idea is to adjust the linear transformations previously learned by batch normalization for all the hidden layers according to the first-pass decoding results of the speaker-independent model. With the adjusted linear transformations for each test speaker, the test distribution of the input of each hidden layer better matches the training distribution. Experiments on the CHiME-3 dataset demonstrate the effectiveness of the proposed layer-wise adaptation approach. Our overall system obtains 4.24% WER on the real subset of the test data, which represents the best reported result on this dataset to date and a relative 27.3% error reduction over the previous best result.
引用
收藏
页码:4890 / 4894
页数:5
相关论文
共 50 条
  • [1] Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models
    Yi, Jiangyan
    Tao, Jianhua
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 176 - 180
  • [2] An approach to robust unsupervised speaker adaptation
    Kim, NS
    Seo, DJ
    Lim, W
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 469 - 472
  • [3] Iterative unsupervised speaker adaptation for batch dictation
    Homma, S
    Takahashi, J
    Sagayama, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1141 - 1144
  • [4] Robust bootstrapping of speaker models for unsupervised speaker indexing
    Fu, ZhongHua
    [J]. MULTIMEDIA CONTENT ANALYSIS AND MINING, PROCEEDINGS, 2007, 4577 : 122 - +
  • [5] Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 316 - 329
  • [6] Robust Normalized Squares Maximization for Unsupervised Domain Adaptation
    Zhang, Wenju
    Zhang, Xiang
    Liao, Qing
    Yang, Wenjing
    Lan, Long
    Luo, Zhigang
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2317 - 2320
  • [7] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
    Swietojanski, Pawel
    Renals, Steve
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176
  • [8] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [9] MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
    Moriya, Yasufumi
    Jones, Gareth J. F.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8643 - 8647
  • [10] The Use of Sense in Unsupervised Training of Acoustic Models for ASR Systems
    Singh, Rita
    Lambert, Benjamin
    Raj, Bhiksha
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2938 - 2941