AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION

被引:1
|
作者
Lv, Hang [1 ,2 ]
Chen, Zhehuai [2 ,5 ]
Xu, Hainan [2 ]
Povey, Daniel [4 ]
Xie, Lei [1 ]
Khudanpur, Sanjeev [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Lab ASLP NPU, Xian, Peoples R China
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21205 USA
[4] Xiaomi Corp, Beijing, Peoples R China
[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Automatic speech recognition; decoder; lattice generation; lattice pruning;
D O I
10.1109/ICASSP39728.2021.9414509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.
引用
收藏
页码:6019 / 6023
页数:5
相关论文
共 50 条
  • [31] REGARDING TOPOLOGY AND ADAPTABILITY IN DIFFERENTIABLE WFST-BASED E2E ASR
    Zhao, Zeyu
    Chen, Pinzhen
    Bell, Peter
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 843 - 847
  • [32] Language model adaptation using WFST-based speaking-style translation
    Hori, T
    Willett, D
    Minami, Y
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 228 - 231
  • [33] Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
    Chen, Peikun
    Sung, Sining
    Shane, Changhao
    Yang, Qing
    Xie, Lei
    INTERSPEECH 2024, 2024, : 4468 - 4472
  • [34] WFST-based Ground Truth Alignment for Difficult Historical Documents with Text Modification and Layout Variations
    Al Azawi, Mayce
    Liwicki, Marcus
    Breuel, Thomas M.
    DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
  • [35] Robust Automatic Speech Recognition with Decoder Oriented Ideal Binary Mask Estimation
    Kim, Lae-Hoon
    Kim, Kyung-Tae
    Hasegawa-Johnson, Mark
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2066 - 2069
  • [36] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
    Beck, Eugen
    Hannemann, Mirko
    Doetsch, Patrick
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
  • [37] NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Chuan-Fei
    Liu, Yan
    Zhang, Tian-Hao
    Chen, Song-Lu
    Chen, Feng
    Yin, Xu-Cheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6527 - 6531
  • [38] A GPU-based WFST Decoder with Exact Lattice Generation
    Chen, Zhehuai
    Luitjens, Justin
    Xu, Hainan
    Wang, Yiming
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2212 - 2216
  • [39] Automatic speech recognition based on diphones
    Basztura, C
    Lisiak, P
    Staroniewicz, P
    MELECON '98 - 9TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 1998, : 6 - 10
  • [40] Automatic Optimization of Speech Decoder Parameters
    El Hannani, Asmaa
    Hain, Thomas
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (01) : 95 - 98