AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION

被引：1

作者：

Lv, Hang ^{[1
,2
]}

Chen, Zhehuai ^{[2
,5
]}

Xu, Hainan ^{[2
]}

Povey, Daniel ^{[4
]}

Xie, Lei ^{[1
]}

Khudanpur, Sanjeev ^{[2
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Lab ASLP NPU, Xian, Peoples R China

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21205 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21205 USA

[4] Xiaomi Corp, Beijing, Peoples R China

[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Automatic speech recognition; decoder; lattice generation; lattice pruning;

D O I：

10.1109/ICASSP39728.2021.9414509

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

引用

页码：6019 / 6023

页数：5

共 50 条

[41] LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION BASED ON WFST STRUCTURED CLASSIFIERS AND DEEP BOTTLENECK FEATURES
Kubo, Yotaro
Hori, Takaaki
Nakamura, Atsushi
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7629 - 7633
[42] Modified Viterbi Decoder for Hmm Based Speech Recognition System
Kumar, Y. Rajeev
Babu, A. Venkatesh
Kumar, K. A. Naveen
Alex, John Sahaya Rani
2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 470 - 474
[43] Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR
Zhao, Zeyu
Bell, Peter
INTERSPEECH 2023, 2023, : 4903 - 4907
[44] Transformer with Bidirectional Decoder for Speech Recognition
Chen, Xi
Zhang, Songyang
Song, Dandan
Ouyang, Peng
Yin, Shouyi
INTERSPEECH 2020, 2020, : 1773 - 1777
[45] A wave decoder for continuous speech recognition
Burhke, E
Chou, W
Zhou, QR
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2135 - 2138
[46] A STACK DECODER FOR CONTINOUS SPEECH RECOGNITION
STURTEVANT, DG
SPEECH AND NATURAL LANGUAGE, 1989, : 193 - 198
[47] MOVIE AUDIO SCENE RECOGNITION BASED ON WFST
Yang, Jichen
Cai, Min
Li, Yanxiong
Jin, Hai
PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 77 - 80
[48] Automatic Speech Recognition Based Odia System
Karan, Biswajit
Sahoo, Jayaprakash
Sahu, P. K.
2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 353 - 356
[49] Automatic Speech Recognition Based on Electromyographic Biosignals
Jou, Szu-Chen Stan
Schultz, Tanja
BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, 2008, 25 : 305 - 320
[50] A Study on Detection Based Automatic Speech Recognition
Ma, Chengyuan
Tsao, Yu
Lee, Chin-Hui
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2350 - 2353

← 1 2 3 4 5 →