DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION

被引:0
|
作者
Zhao, Yong [1 ]
Li, Jinyu [1 ]
Zhang, Shixiong [1 ]
Chen, Liping [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA
关键词
deep neural network; domain adaptation; speaker adaptation; anchor embedding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.
引用
收藏
页码:5984 / 5988
页数:5
相关论文
共 50 条
  • [1] Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters
    Xi, Yuxuan
    Li, Pengcheng
    Song, Yan
    Jiang, Yiheng
    Dai, Lirong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 513 - 518
  • [2] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
    COX, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
  • [3] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [4] SPEAKER ADAPTATION IN A LIMITED SPEECH RECOGNITION SYSTEM
    MAKHOUL, J
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (09) : 1057 - &
  • [5] Quick fMLLR for speaker adaptation in speech recognition
    Varadarajan, Balakrishnan
    Povey, Daniel
    Chu, Stephen M.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4297 - +
  • [6] Speaker Adaptation on Myanmar Spontaneous Speech Recognition
    Naing, Hay Mar Soe
    Pa, Win Pa
    [J]. COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 303 - 313
  • [7] XMLLR for Improved Speaker Adaptation in Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1705 - +
  • [8] On robustness of unsupervised domain adaptation for speaker recognition
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    [J]. INTERSPEECH 2019, 2019, : 2958 - 2962
  • [9] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition
    Rodríguez, LJ
    Torres, MI
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 433 - 440
  • [10] Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
    Strom, N
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 989 - 992