AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION

被引：1

作者：

Lv, Hang ^{[1
,2
]}

Chen, Zhehuai ^{[2
,5
]}

Xu, Hainan ^{[2
]}

Povey, Daniel ^{[4
]}

Xie, Lei ^{[1
]}

Khudanpur, Sanjeev ^{[2
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Lab ASLP NPU, Xian, Peoples R China

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21205 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21205 USA

[4] Xiaomi Corp, Beijing, Peoples R China

[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Automatic speech recognition; decoder; lattice generation; lattice pruning;

D O I：

10.1109/ICASSP39728.2021.9414509

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

引用

页码：6019 / 6023

页数：5

共 50 条

[31] REGARDING TOPOLOGY AND ADAPTABILITY IN DIFFERENTIABLE WFST-BASED E2E ASR
Zhao, Zeyu
Chen, Pinzhen
Bell, Peter
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 843 - 847
[32] Language model adaptation using WFST-based speaking-style translation
Hori, T
Willett, D
Minami, Y
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 228 - 231
[33] Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Chen, Peikun
Sung, Sining
Shane, Changhao
Yang, Qing
Xie, Lei
INTERSPEECH 2024, 2024, : 4468 - 4472
[34] WFST-based Ground Truth Alignment for Difficult Historical Documents with Text Modification and Layout Variations
Al Azawi, Mayce
Liwicki, Marcus
Breuel, Thomas M.
DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
[35] Robust Automatic Speech Recognition with Decoder Oriented Ideal Binary Mask Estimation
Kim, Lae-Hoon
Kim, Kyung-Tae
Hasegawa-Johnson, Mark
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2066 - 2069
[36] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Beck, Eugen
Hannemann, Mirko
Doetsch, Patrick
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
[37] NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION
Zhang, Chuan-Fei
Liu, Yan
Zhang, Tian-Hao
Chen, Song-Lu
Chen, Feng
Yin, Xu-Cheng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6527 - 6531
[38] A GPU-based WFST Decoder with Exact Lattice Generation
Chen, Zhehuai
Luitjens, Justin
Xu, Hainan
Wang, Yiming
Povey, Daniel
Khudanpur, Sanjeev
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2212 - 2216
[39] Automatic speech recognition based on diphones
Basztura, C
Lisiak, P
Staroniewicz, P
MELECON '98 - 9TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 1998, : 6 - 10
[40] Automatic Optimization of Speech Decoder Parameters
El Hannani, Asmaa
Hain, Thomas
IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (01) : 95 - 98

← 1 2 3 4 5 →