Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition

被引:1
|
作者
Wang X. [1 ]
Long Y. [1 ]
Xu D. [2 ]
机构
[1] Key Innovation Group of Digital Humanities Resource and Research, and Shanghai Engineering Research Center of Intelligent Education and Bigdata, Shanghai Normal University, Shanghai
[2] Unisound AI Technology Co., Ltd., Beijing
基金
中国国家自然科学基金;
关键词
Accent-discriminative encoder; Accent-invariant; Conformer; Speech recognition;
D O I
10.1007/s10772-022-10010-z
中图分类号
学科分类号
摘要
Accent-variation is a challenging issue, either for traditional hybrid or current end-to-end (E2E) automatic speech recognition (ASR). Building an accent-invariant and high quality ASR system is very important for most real applications. In this study, we propose a Conformer-based architecture with accent-discriminative encoders, to leverage the accent attributes of input speech for enhancing an accent-invariant E2E ASR system. In this architecture, the encoders are composed of one universal, and two dominant accent-specific encoders. These encoders are first pre-trained and then jointly adapted with a single attention-based decoder in an end-to-end manner. Furthermore, different weighting methods and a multi-encoder-decoder architecture is also investigated and compared. Our experiments are performed on the public Common Voice with five different English-accents, results show that our proposed architecture outperforms the strong baseline in both in-domain and out-of-domain accented-ASR tasks, with a relative 2.9–3.8% word error rate reduction. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:987 / 995
页数:8
相关论文
共 21 条
  • [21] CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
    Jiangyan Yi
    Zhengqi Wen
    Jianhua Tao
    Hao Ni
    Bin Liu
    Journal of Signal Processing Systems, 2018, 90 : 985 - 997