Memory-efficient LVCSR search using a one-pass stack decoder

被引：2

作者：

Schuster, M ^{[1
]}

机构：

[1] ATR Interpreting Telecommun Res Labs, Kyoto 61902, Japan

来源：

COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 01期

关键词：

D O I：

10.1006/csla.1999.0135

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the details of a fast, memory-efficient one-pass stack decoder for efficient evaluation of the search space for large vocabulary continuous speech recognition. A modem efficient search engine is not based on a single idea, but is a rather complex collection of separate algorithms and practical implementation details, which only in combination make the search efficient in time and memory requirements. Being the core of a speech recognition system, the software design phase for a new decoder is often crucial for its later performance and flexibility. This paper tries to emphasize this point-after defining the requirements for a modem decoder, it describes the details of an implementation that is based on a stack decoder framework. It is shown how it is possible to handle arbitrary order N-grams, how to generate N-best lists or lattices next to the first-best hypothesis at little computational overhead, how to handle efficiently cross-word acoustic models of any context order, how to efficiently constrain the search with word graphs or word-pair grammars, and how to use a fast-match with delay to speed up the search, all in a single left-to-right search pass. The details of a disk-based representation of an N-gram language model are given, which make it possible to use language models (LMs) of arbitrary (file) size in only a few hundred kB of memory. On-demand N-gram smearing, an efficient improvement over the regular unigram smearing used as an approximation to the LM scores in a tree lexicon, is introduced. It is also shown how lattice rescoring, the generation of forced alignments and detailed phone-/state-alignments can efficiently be integrated into a single stack decoder. The decoder named "Nozomi"(1) was tested on a Japanese newspaper dictation task using a 5000 word vocabulary. Using computationally cheap models it is possible to achieve real-time performance with 89% word recognition accuracy at about 1% search error using only 4 MB of total memory on a 300 MHz Pentium II. With computationally more expensive acoustic models, which also cover for the Japanese language essential cross-word effects, more than 95% recognition accuracy(2) is reached. (C) 2000 Academic Press.

引用

页码：47 / 77

页数：31

共 50 条

[21] A memory-efficient block-wise MAP decoder architecture
Kim, S
Hwang, SY
Kang, MJ
[J]. ETRI JOURNAL, 2004, 26 (06) : 615 - 621
[22] Design of low-power memory-efficient viterbi decoder
Chen, Lupin
He, Jinjin
Wang, Zhongfeng
[J]. 2007 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, VOLS 1 AND 2, 2007, : 132 - 135
[23] ONE-PASS CODE GENERATION USING CONTINUATIONS
CLARKE, K
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 1989, 19 (12): : 1175 - 1192
[24] Memory-Efficient Differentiable Transformer Architecture Search
Zhao, Yuekai
Dong, Li
Shen, Yelong
Zhang, Zhihua
Wei, Furu
Chen, Weizhu
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4254 - 4264
[25] Efficient one-pass 3-D time migration
Brzostowski, MA
Snyder, FFC
Smith, PJ
[J]. GEOPHYSICS, 1996, 61 (06) : 1833 - 1845
[26] Optimal One-Pass Nonparametric Estimation Under Memory Constraint
Quan, Mingxue
Lin, Zhenhua
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 285 - 296
[27] Efficient Certificate Based One-pass Authentication Protocol for IMS
Ashraf, Humaira
Ullah, Ata
Tahira, Shireen
Sher, Muhammad
[J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1133 - 1143
[28] FPGA implementation of a high-throughput memory-efficient LDPC decoder
School of Electronic Engineering, Xidian Univ., Xi'an 710071, China
不详
[J]. Xi'an Dianzi Keji Daxue Xuebao, 2008, 3 (427-432):
[29] MEMORY-EFFICIENT PATH METRIC UPDATE METHOD IN MAP DECODER IMPLEMENTATION
He Chun Hu Jianhao (National Key Lab. of Communications
[J]. Journal of Electronics(China), 2008, (02) : 145 - 149
[30] A memory-efficient VLC decoder architecture for MPEG-2 application
Min, KY
Chong, JW
[J]. 2000 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2000, : 43 - 49

← 1 2 3 4 5 →