Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

被引:1
|
作者
Pan, Jia [1 ]
Wan, Genshun [1 ]
Du, Jun [1 ]
Ye, Zhongfu [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230052, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Adaptation models; Acoustics; Hidden Markov models; Task analysis; Training; Data models; Speech recognition; Speaker adaptation; speech recognition; neural network; memory-aware networks; NEURAL-NETWORK;
D O I
10.1109/TASLP.2020.2980372
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In our previous work, we introduced our attention-based speaker adaptation method, which has been proved to be an efficient online speaker adaptation method for real-time speech recognition. In this paper, we present a more complete framework of this method named memory-aware networks, which consists of the main network, the memory module, the attention module and the connection module. A gate mechanism and a multiple-connections strategy are presented to connect the memory with the main network in order to take full advantage of the memory. An auxiliary speaker classification task is provided to improve the accuracy of the attention module. The fixed-size ordinally forgetting encoding method is used together with average pooling to gather both short-term and long-term information. Furthermore, instead of only using traditional speaker embeddings such as i-vectors or d-vectors as the memory, we design a new form of memory called residual vectors, which can represent different pronunciation habits. Experiments on both the Switchboard and AISHELL-2 tasks show that our method can perform online speaker adaptation very well with no additional adaptation data and with only a relative 3% increase in decoding computation complexity. Under the cross-entropy criterion, our method achieves a relative word error rate reduction of 9.4% and 8.3% compared to that of the speaker-independent model on the Switchboard task and the AISHELL-2 task, respectively, and approximately 7.0% compared to that of the traditional d-vector-based speaker adaptation method.
引用
收藏
页码:1025 / 1037
页数:13
相关论文
共 50 条
  • [1] SPEAKER ADAPTATION USING SPECTRAL INTERPOLATION FOR SPEECH RECOGNITION
    SHINODA, K
    ISO, KI
    WATANABE, T
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (10): : 1 - 11
  • [2] Higher Accuracy of Hindi Speech Recognition Due to Online Speaker Adaptation
    Sivaraman, Ganesh
    Malta, Swapnil
    Nabar, Neeraj
    Samudravijaya, K.
    [J]. TECHNOLOGY SYSTEMS AND MANAGEMENT, 2011, 145 : 233 - +
  • [3] Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification
    Shih, Po-Yi
    Lin, Po-Chuan
    Wang, Jhing-Fa
    Lin, Yuan-Ning
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (05) : 1459 - 1467
  • [4] PREDICTIVE SPEAKER ADAPTATION IN SPEECH RECOGNITION
    COX, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1995, 9 (01): : 1 - 17
  • [5] Fast speaker adaptation of artificial neural networks for automatic speech recognition
    Dupont, S
    Cheboub, L
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1795 - 1798
  • [6] Optimal memory-aware backpropagation of deep join networks
    Beaumont, Olivier
    Herrmann, Julien
    Pallez , Guillaume
    Shilova, Alena
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [7] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Jae-Bok Kim
    Jeong-Sik Park
    Yung-Hwan Oh
    [J]. Cognitive Computation, 2012, 4 : 398 - 408
  • [8] Speech Recognition Using Speaker Adaptation by System Parameter Transformation
    Hao, Ying
    Fang, Ditang
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01): : 63 - 68
  • [9] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Kim, Jae-Bok
    Park, Jeong-Sik
    Oh, Yung-Hwan
    [J]. COGNITIVE COMPUTATION, 2012, 4 (04) : 398 - 408
  • [10] Speaker adaptation techniques for speech recognition using probabilistic models
    Shinoda, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2005, 88 (12): : 25 - 42