A Transformer-Based End-to-End Automatic Speech Recognition Algorithm

被引:0
|
作者
Dong, Fang [1 ]
Qian, Yiyang [2 ]
Wang, Tianlei [2 ]
Liu, Peng [3 ]
Cao, Jiuwen [2 ]
机构
[1] Hangzhou City Univ, Sch Informat & Elect Engn, Hangzhou 310015, Peoples R China
[2] Hangzhou Dianzi Univ, Machine Learning & I Hlth Int Cooperat Base Zhejia, Hangzhou 310018, Peoples R China
[3] Zhejiang Baiying Technol Ltd Co, Zhejiang 311100, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic speech recognition; soft beam pruning; prefix module; transformer; professional terminology;
D O I
10.1109/LSP.2023.3328238
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
End-to-End (E2E) automatic speech recognition (ASR) becomes popular recent years and has been widely used in many applications. However, current ASR algorithms are usually less effective when applied in specific applications with terminologies such as medical and economic fields. To address this issue, we propose a powerful Transformer based ASR decoding method for beam searching, called soft beam pruning algorithm (SBPA). SBPA can dynamically adjust the width of beam search. Meanwhile, a prefix module (PM) is added to access the contextual information and avoid removing professional words in the beam search. Combining SBPA and PM, the proposed ASR can achieve promising recognition performance on professional terminologies. To verify the effectiveness, experiments are conducted on real-world conversation data with medical terminology. It is shown that the proposed ASR achieved significant performance on both professional and regular words.
引用
收藏
页码:1592 / 1596
页数:5
相关论文
共 50 条
  • [31] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [32] Transformer-Based End-to-End Anatomical and Functional Image Fusion
    Zhang, Jing
    Liu, Aiping
    Wang, Dan
    Liu, Yu
    Wang, Z. Jane
    Chen, Xun
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71 : 1 - 1
  • [33] An efficient transformer-based surrogate model with end-to-end training strategies for automatic history matching
    Zhang, Jinding
    Kang, Jinzheng
    Zhang, Kai
    Zhang, Liming
    Liu, Piyang
    Liu, Xingyu
    Sun, Weijia
    Wang, Guangyao
    [J]. GEOENERGY SCIENCE AND ENGINEERING, 2024, 240
  • [34] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [35] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Tu, Zehai
    Deadman, Jack
    Ma, Ning
    Barker, Jon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
  • [36] Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Fang, Cunhang
    Dang, Jianwu
    Kawahara, Tatsuya
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 438 - 442
  • [37] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    [J]. ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [38] End-to-End Transformer-Based Models in Textual-Based NLP
    Rahali, Abir
    Akhloufi, Moulay A.
    [J]. AI, 2023, 4 (01) : 54 - 110
  • [39] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [40] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463