Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition

被引:1
|
作者
Wang X. [1 ]
Long Y. [1 ]
Xu D. [2 ]
机构
[1] Key Innovation Group of Digital Humanities Resource and Research, and Shanghai Engineering Research Center of Intelligent Education and Bigdata, Shanghai Normal University, Shanghai
[2] Unisound AI Technology Co., Ltd., Beijing
基金
中国国家自然科学基金;
关键词
Accent-discriminative encoder; Accent-invariant; Conformer; Speech recognition;
D O I
10.1007/s10772-022-10010-z
中图分类号
学科分类号
摘要
Accent-variation is a challenging issue, either for traditional hybrid or current end-to-end (E2E) automatic speech recognition (ASR). Building an accent-invariant and high quality ASR system is very important for most real applications. In this study, we propose a Conformer-based architecture with accent-discriminative encoders, to leverage the accent attributes of input speech for enhancing an accent-invariant E2E ASR system. In this architecture, the encoders are composed of one universal, and two dominant accent-specific encoders. These encoders are first pre-trained and then jointly adapted with a single attention-based decoder in an end-to-end manner. Furthermore, different weighting methods and a multi-encoder-decoder architecture is also investigated and compared. Our experiments are performed on the public Common Voice with five different English-accents, results show that our proposed architecture outperforms the strong baseline in both in-domain and out-of-domain accented-ASR tasks, with a relative 2.9–3.8% word error rate reduction. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
下载
收藏
页码:987 / 995
页数:8
相关论文
共 21 条
  • [1] AIPNET: GENERATIVE ADVERSARIAL PRE-TRAINING OF ACCENT-INVARIANT NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Chen, Yi-Chen
    Yang, Zhaojun
    Yeh, Ching-Feng
    Jain, Mahaveer
    Seltzer, Michael L.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6979 - 6983
  • [2] Efficient conformer-based speech recognition with linear attention
    Li, Shengqiang
    Xu, Menglong
    Zhang, Xiao-Lei
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 448 - 453
  • [3] Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition
    Guo, Hanzhi
    Chen, Yunshu
    Xie, Xukang
    Xu, Gaopeng
    Guo, Wei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 522 - 526
  • [4] (Speech recognition based on Spanish accent acoustic model)
    Plaza, Johanna
    Sanchez-Zhunio, Cristina
    Acosta-Uriguen, Maria-Ines
    Orellana, Marcos
    Cedillo, Priscila
    Zambrano-Martinez, Jorge Luis
    ENFOQUE UTE, 2022, 13 (03): : 45 - 57
  • [5] A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
    Jiang, Peiyuan
    Pan, Weijun
    Zhang, Jian
    Wang, Teng
    Huang, Junxiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 911 - 940
  • [6] Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition
    Zhang, Chao
    Liu, Yi
    Xia, Yunqing
    Wang, Xuan
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2073 - 2084
  • [7] CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTROM ATTENTION AND ROTARY POSITION EMBEDDING
    Samarakoon, Lahiru
    Leung, Tsun-Yat
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8012 - 8016
  • [8] Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
    Li, Shengqiang
    Xu, Menglong
    Zhang, Xiao-Lei
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 443 - 447
  • [9] Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
    Audhkhasi, Kartik
    Huang, Yinghui
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    INTERSPEECH 2022, 2022, : 1026 - 1030
  • [10] MULTI-ACCENT SPEECH RECOGNITION WITH HIERARCHICAL GRAPHEME BASED MODELS
    Rao, Kanishka
    Sak, Hasim
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4815 - 4819