A transformer-based network for speech recognition

被引:1
|
作者
Tang L. [1 ]
机构
[1] College of Computer and Control Engineering, Northeast Forestry University, Heilongjiang Province, Harbin City
关键词
Homophone; Noisy audio; Speech recognition; Transformer;
D O I
10.1007/s10772-023-10034-z
中图分类号
学科分类号
摘要
In the field of automatic speech recognition (ASR), the noisy audio data and the ambiguity in recognizing homophone lead to the degradation of model performance. In order to address the mentioned problems, a network called DMRS-transformer, a Transformer-based network, is proposed in this study. The proposed DMRS-Transformer mainly consists of two components except for the traditional Transformer network, which are denoising module and Mandarin recognition supplementary module respectively. The denoising module is used for pruning the trivial features caused by the noisy input audio data. The Mandarin recognition supplementary module, short for MRS module, tends to tackle the problem of recognizing Mandarin speech signals which have several homophones. Empirical evaluations have been conducted on two widely used datasets, which are Aishell-1 and HKUST respectively. The experimental results can validate the effectiveness of the proposed DMRS-Transformer network. Compared with the Transformer baseline, the proposed DMRS-Transformer has 0.8% CER improvement and 1.5% CER improvement in these two datasets respectively. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:531 / 539
页数:8
相关论文
共 50 条
  • [11] Transformer-based network with temporal depthwise convolutions for sEMG recognition
    Wang, Zefeng
    Yao, Junfeng
    Xu, Meiyan
    Jiang, Min
    Su, Jinsong
    [J]. PATTERN RECOGNITION, 2024, 145
  • [12] A Transformer-based Radical Analysis Network for Chinese Character Recognition
    Yang, Chen
    Wang, Qing
    Du, Jun
    Zhang, Jianshu
    Wu, Changjie
    Wang, Jiaming
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3714 - 3719
  • [13] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
  • [14] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
    Lehecka, Jan
    Psutka, Josef, V
    Psutka, Josef
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312
  • [15] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    [J]. INTERSPEECH 2020, 2020, : 5011 - 5015
  • [16] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    [J]. INTERSPEECH 2021, 2021, : 967 - 968
  • [17] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
    Yue, Fengpeng
    Ko, Tom
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [18] A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition
    Feng, Sheng
    Zhu, Xiaoqian
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [19] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
    Wang, Yongqiang
    Shi, Yangyang
    Zhang, Frank
    Wu, Chunyang
    Chan, Julian
    Yeh, Ching-Feng
    Xiao, Alex
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
  • [20] Musical Speech: A Transformer-based Composition Tool
    d'Eon, Jason
    Dumpala, Harsha
    Sastry, Chandramouli Shama
    Oore, Dani
    Oore, Sageev
    [J]. NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 253 - 274