DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION

被引:0
|
作者
Zhao, Yong [1 ]
Li, Jinyu [1 ]
Zhang, Shixiong [1 ]
Chen, Liping [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA
关键词
deep neural network; domain adaptation; speaker adaptation; anchor embedding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.
引用
收藏
页码:5984 / 5988
页数:5
相关论文
共 50 条
  • [21] Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition
    Li, Longxin
    Mak, Man-Wai
    Chien, Jen-Tzung
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2236 - 2245
  • [22] UNSUPERVISED DOMAIN ADAPTATION VIA DOMAIN ADVERSARIAL TRAINING FOR SPEAKER RECOGNITION
    Wang, Qing
    Rao, Wei
    Sun, Sining
    Xie, Lei
    Chng, Eng Siong
    Li, Haizhou
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4889 - 4893
  • [23] Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
    Siniscalchi, Sabato Marco
    Li, Jinyu
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2152 - 2161
  • [24] Speaker adaptation of fuzzy-perceptron-based speech recognition
    Lin, CT
    Nein, HW
    Lin, WF
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1999, 7 (01) : 1 - 30
  • [25] Speaker adaptation for hybrid MMI/connectionist speech recognition systems
    Rottland, J
    Neukirchen, C
    Rigoll, G
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 465 - 468
  • [26] Speech Recognition Using Speaker Adaptation by System Parameter Transformation
    Hao, Ying
    Fang, Ditang
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01): : 63 - 68
  • [27] Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition
    Hayashi, Toyohiro
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 542 - 545
  • [28] MAP speaker adaptation of state duration distributions for speech recognition
    Yoma, NB
    Sánchez, JS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 443 - 450
  • [29] Discriminative speaker adaptation in Persian continuous speech recognition systems
    Pirhosseinloo, Shadi
    Ganj, Farshad Almas
    4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 296 - 301
  • [30] Unsupervised speaker adaptation for robust speech recognition in real environments
    Yamade, S
    Baba, A
    Yoshikawa, S
    Lee, A
    Saruwatari, H
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (08): : 30 - 41