A Transformer-Based End-to-End Automatic Speech Recognition Algorithm

被引:0
|
作者
Dong, Fang [1 ]
Qian, Yiyang [2 ]
Wang, Tianlei [2 ]
Liu, Peng [3 ]
Cao, Jiuwen [2 ]
机构
[1] Hangzhou City Univ, Sch Informat & Elect Engn, Hangzhou 310015, Peoples R China
[2] Hangzhou Dianzi Univ, Machine Learning & I Hlth Int Cooperat Base Zhejia, Hangzhou 310018, Peoples R China
[3] Zhejiang Baiying Technol Ltd Co, Zhejiang 311100, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic speech recognition; soft beam pruning; prefix module; transformer; professional terminology;
D O I
10.1109/LSP.2023.3328238
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
End-to-End (E2E) automatic speech recognition (ASR) becomes popular recent years and has been widely used in many applications. However, current ASR algorithms are usually less effective when applied in specific applications with terminologies such as medical and economic fields. To address this issue, we propose a powerful Transformer based ASR decoding method for beam searching, called soft beam pruning algorithm (SBPA). SBPA can dynamically adjust the width of beam search. Meanwhile, a prefix module (PM) is added to access the contextual information and avoid removing professional words in the beam search. Combining SBPA and PM, the proposed ASR can achieve promising recognition performance on professional terminologies. To verify the effectiveness, experiments are conducted on real-world conversation data with medical terminology. It is shown that the proposed ASR achieved significant performance on both professional and regular words.
引用
收藏
页码:1592 / 1596
页数:5
相关论文
共 50 条
  • [1] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
  • [2] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    [J]. INTERSPEECH 2020, 2020, : 5011 - 5015
  • [3] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    [J]. INTERSPEECH 2021, 2021, : 967 - 968
  • [4] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
    Yue, Fengpeng
    Ko, Tom
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [5] Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
    Lohrenz, Timo
    Li, Zhengyang
    Fingscheidt, Tim
    [J]. INTERSPEECH 2021, 2021, : 2846 - 2850
  • [6] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    [J]. ETRI JOURNAL, 2022, 44 (03) : 476 - 490
  • [7] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [8] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [9] A study of transformer-based end-to-end speech recognition system for Kazakh language
    Mamyrbayev Orken
    Oralbekova Dina
    Alimhan Keylan
    Turdalykyzy Tolganay
    Othman Mohamed
    [J]. Scientific Reports, 12
  • [10] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088