AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION

被引:1
|
作者
Lv, Hang [1 ,2 ]
Chen, Zhehuai [2 ,5 ]
Xu, Hainan [2 ]
Povey, Daniel [4 ]
Xie, Lei [1 ]
Khudanpur, Sanjeev [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Lab ASLP NPU, Xian, Peoples R China
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21205 USA
[4] Xiaomi Corp, Beijing, Peoples R China
[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Automatic speech recognition; decoder; lattice generation; lattice pruning;
D O I
10.1109/ICASSP39728.2021.9414509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.
引用
收藏
页码:6019 / 6023
页数:5
相关论文
共 50 条
  • [1] Improved subword modeling for WFST-based speech recognition
    Smit, Peter
    Virpioja, Sami
    Kurimo, Mikko
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2551 - 2555
  • [2] Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition
    Novak, Josef R.
    Minematsu, Nobuaki
    Hirose, Keikichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1078 - 1081
  • [3] Tied-State Mixture Language Model for WFST-based Speech Recognition
    Yamamoto, Hitoshi
    Dixon, Paul R.
    Matsuda, Shigeki
    Hori, Chiori
    Kashioka, Hideki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 174 - 177
  • [4] WFST Compression for Automatic Speech Recognition
    Caseiro, Diamantino
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1493 - 1496
  • [5] A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
    Wang Yonghe
    Bao, Feilong
    Gao, Gaunglai
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [6] Compact and Efficient WFST-based Decoders for Handwriting Recognition
    Cai, Meng
    Huo, Qiang
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 143 - 148
  • [7] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
    Watanabe, Shinji
    Hori, Takaaki
    Nakamura, Atsushi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349
  • [8] WFST-BASED STRUCTURAL CLASSIFICATION INTEGRATING DNN ACOUSTIC FEATURES AND RNN LANGUAGE FEATURES FOR SPEECH RECOGNITION
    Quoc Truong Do
    Nakamura, Satoshi
    Delcroix, Marc
    Hori, Takaaki
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4959 - 4963
  • [9] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
    Miao, Yajie
    Gowayyed, Mohammad
    Metze, Florian
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
  • [10] A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit
    Chong, Jike
    Gonina, Ekaterina
    Yi, Youngmin
    Keutzer, Kurt
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1187 - 1190