A transformer-based network for speech recognition

被引:1
|
作者
Tang L. [1 ]
机构
[1] College of Computer and Control Engineering, Northeast Forestry University, Heilongjiang Province, Harbin City
关键词
Homophone; Noisy audio; Speech recognition; Transformer;
D O I
10.1007/s10772-023-10034-z
中图分类号
学科分类号
摘要
In the field of automatic speech recognition (ASR), the noisy audio data and the ambiguity in recognizing homophone lead to the degradation of model performance. In order to address the mentioned problems, a network called DMRS-transformer, a Transformer-based network, is proposed in this study. The proposed DMRS-Transformer mainly consists of two components except for the traditional Transformer network, which are denoising module and Mandarin recognition supplementary module respectively. The denoising module is used for pruning the trivial features caused by the noisy input audio data. The Mandarin recognition supplementary module, short for MRS module, tends to tackle the problem of recognizing Mandarin speech signals which have several homophones. Empirical evaluations have been conducted on two widely used datasets, which are Aishell-1 and HKUST respectively. The experimental results can validate the effectiveness of the proposed DMRS-Transformer network. Compared with the Transformer baseline, the proposed DMRS-Transformer has 0.8% CER improvement and 1.5% CER improvement in these two datasets respectively. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:531 / 539
页数:8
相关论文
共 50 条
  • [1] End to end transformer-based contextual speech recognition based on pointer network
    Lin, Binghuai
    Wang, Liyuan
    [J]. INTERSPEECH 2021, 2021, : 2087 - 2091
  • [2] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    [J]. ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [3] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [4] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
    Lu, Xingyu
    Hu, Jianguo
    Li, Shenhao
    Ding, Yanyu
    [J]. 2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
  • [5] UNTIED POSITIONAL ENCODINGS FOR EFFICIENT TRANSFORMER-BASED SPEECH RECOGNITION
    Samarakoon, Lahiru
    Fung, Ivan
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 108 - 114
  • [6] A Transformer-Based Network for Dynamic Hand Gesture Recognition
    D'Eusanio, Andrea
    Simoni, Alessandro
    Pini, Stefano
    Borghi, Guido
    Vezzani, Roberto
    Cucchiara, Rita
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 623 - 632
  • [7] Simulating reading mistakes for child speech Transformer-based phone recognition
    Gelin, Lucile
    Pellegrini, Thomas
    Pinquier, Julien
    Daniel, Morgane
    [J]. INTERSPEECH 2021, 2021, : 3860 - 3864
  • [8] Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
    Li, Sheng
    Raj, Dabre
    Lu, Xugang
    Shen, Peng
    Kawahara, Tatsuya
    Kawai, Hisashi
    [J]. INTERSPEECH 2019, 2019, : 4400 - 4404
  • [9] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [10] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
    Zhao, Chendong
    Wang, Jianzong
    Wei, Wenqi
    Qu, Xiaoyang
    Wang, Haoqian
    Xiao, Jing
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180