The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenes

被引:0
|
作者
Liu Y. [1 ,2 ]
Zheng L. [1 ,2 ]
Li T. [1 ,2 ]
Zhang P. [1 ,2 ]
机构
[1] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
来源
Shengxue Xuebao/Acta Acustica | 2023年 / 48卷 / 06期
关键词
Acoustic encoder; Automatic speech recognition; Neural architecture search; Self-adaptation;
D O I
10.12395/0371-0025.2022114
中图分类号
学科分类号
摘要
In this paper, a scene-adaptive acoustic encoder (SAE) is proposed for different speech scenes. This method adaptively designs an appropriate acoustic encoder for end-to-end speech recognition tasks by learning the differences of acoustic features in different acoustic scenes. By the application of the neural architecture search method, the effectiveness of encoder design and the performance of downstream recognition tasks are improved. Experiments on three commonly used Chinese and English dataset, Aishell-1, HKUST and SWBD, show that the proposed SAE can achieve average 5% relative character error rate reductions than the best human-designed encoders. The results show that the proposed method is an effective method for analysis of acoustic features in specific scenes and targeted design of high-performance acoustic encoders. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1260 / 1268
页数:8
相关论文
共 25 条
  • [1] 52, 11, pp. 1530-1534, (2012)
  • [2] 28, 1
  • [3] Povey D., Discriminative training for large vocabulary speech recognition, (2003)
  • [4] pp. 192-196
  • [5] 11, 2, pp. 169-173
  • [6] pp. 1105-1113
  • [7] Vielzeuf V, Antipov G., Are E2E ASR models ready for an industrial usage?, (2021)
  • [8] 38, 3, pp. 755-759
  • [9] Jain M, Schubert K, Mahadeokar J, Et al., RNN-T for latency controlled ASR with improved beam search, (2019)
  • [10] Graves A, Fernandez S, Gomez F, Et al., Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, 23rd international conference on Machine learning, pp. 369-376, (2006)