AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION

被引：1

作者：

Lv, Hang ^{[1
,2
]}

Chen, Zhehuai ^{[2
,5
]}

Xu, Hainan ^{[2
]}

Povey, Daniel ^{[4
]}

Xie, Lei ^{[1
]}

Khudanpur, Sanjeev ^{[2
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Lab ASLP NPU, Xian, Peoples R China

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21205 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21205 USA

[4] Xiaomi Corp, Beijing, Peoples R China

[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Automatic speech recognition; decoder; lattice generation; lattice pruning;

D O I：

10.1109/ICASSP39728.2021.9414509

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

引用

页码：6019 / 6023

页数：5

共 50 条

[1] Improved subword modeling for WFST-based speech recognition
Smit, Peter
Virpioja, Sami
Kurimo, Mikko
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2551 - 2555
[2] Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition
Novak, Josef R.
Minematsu, Nobuaki
Hirose, Keikichi
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1078 - 1081
[3] Tied-State Mixture Language Model for WFST-based Speech Recognition
Yamamoto, Hitoshi
Dixon, Paul R.
Matsuda, Shigeki
Hori, Chiori
Kashioka, Hideki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 174 - 177
[4] WFST Compression for Automatic Speech Recognition
Caseiro, Diamantino
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1493 - 1496
[5] A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
Wang Yonghe
Bao, Feilong
Gao, Gaunglai
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
[6] Compact and Efficient WFST-based Decoders for Handwriting Recognition
Cai, Meng
Huo, Qiang
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 143 - 148
[7] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
Watanabe, Shinji
Hori, Takaaki
Nakamura, Atsushi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349
[8] WFST-BASED STRUCTURAL CLASSIFICATION INTEGRATING DNN ACOUSTIC FEATURES AND RNN LANGUAGE FEATURES FOR SPEECH RECOGNITION
Quoc Truong Do
Nakamura, Satoshi
Delcroix, Marc
Hori, Takaaki
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4959 - 4963
[9] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
Miao, Yajie
Gowayyed, Mohammad
Metze, Florian
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
[10] A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit
Chong, Jike
Gonina, Ekaterina
Yi, Youngmin
Keutzer, Kurt
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1187 - 1190

← 1 2 3 4 5 →