Memory-efficient LVCSR search using a one-pass stack decoder

被引：2

作者：

Schuster, M ^{[1
]}

机构：

[1] ATR Interpreting Telecommun Res Labs, Kyoto 61902, Japan

来源：

COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 01期

关键词：

D O I：

10.1006/csla.1999.0135

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the details of a fast, memory-efficient one-pass stack decoder for efficient evaluation of the search space for large vocabulary continuous speech recognition. A modem efficient search engine is not based on a single idea, but is a rather complex collection of separate algorithms and practical implementation details, which only in combination make the search efficient in time and memory requirements. Being the core of a speech recognition system, the software design phase for a new decoder is often crucial for its later performance and flexibility. This paper tries to emphasize this point-after defining the requirements for a modem decoder, it describes the details of an implementation that is based on a stack decoder framework. It is shown how it is possible to handle arbitrary order N-grams, how to generate N-best lists or lattices next to the first-best hypothesis at little computational overhead, how to handle efficiently cross-word acoustic models of any context order, how to efficiently constrain the search with word graphs or word-pair grammars, and how to use a fast-match with delay to speed up the search, all in a single left-to-right search pass. The details of a disk-based representation of an N-gram language model are given, which make it possible to use language models (LMs) of arbitrary (file) size in only a few hundred kB of memory. On-demand N-gram smearing, an efficient improvement over the regular unigram smearing used as an approximation to the LM scores in a tree lexicon, is introduced. It is also shown how lattice rescoring, the generation of forced alignments and detailed phone-/state-alignments can efficiently be integrated into a single stack decoder. The decoder named "Nozomi"(1) was tested on a Japanese newspaper dictation task using a 5000 word vocabulary. Using computationally cheap models it is possible to achieve real-time performance with 89% word recognition accuracy at about 1% search error using only 4 MB of total memory on a 300 MHz Pentium II. With computationally more expensive acoustic models, which also cover for the Japanese language essential cross-word effects, more than 95% recognition accuracy(2) is reached. (C) 2000 Academic Press.

引用

页码：47 / 77

页数：31

共 50 条

[1] A one-pass real-time decoder using memory-efficient state network
Shao, Jian
Li, Ta
Zhang, Qingqing
Zhao, Qingwei
Yan, Yonghong
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 529 - 537
[2] Memory-Efficient Batch Normalization by One-Pass Computation for On-Device Training
Dai, He
Wang, Hang
Zhang, Xuchong
Sun, Hongbin
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3186 - 3190
[3] Efficient evaluation of the LVCSR search space using the NOWAY decoder
Renals, S
Hochberg, M
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 149 - 152
[4] AN EFFICIENT ONE-PASS SEARCH ALGORITHM FOR PARSING SPOKEN LANGUAGE
OKADA, M
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1992, E75A (07) : 944 - 953
[5] One-pass LVCSR algorithm using linear lexicon search and 1-best approximation tree-structured lexicon search
Kitaoka, Norihide
Liang, Ying
Takahashi, Nobutoshi
Nakagawa, Seiichi
[J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 600 - +
[6] A memory-efficient progressive JPEG decoder
Lee, Kun-Bin
Ju, Chi-Cheng
[J]. 2007 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PAPERS, 2007, : 8 - +
[7] Efficient One-Pass Chase Soft-Decision BCH Decoder for Multi-Level Cell NAND Flash Memory
Zhang, Xinmiao
Zhu, Jiangli
Wu, Yingquan
[J]. 2011 IEEE 54TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2011,
[8] A one-pass decoder based on polymorphic linguistic context assignment
Soltau, H
Metze, F
Fügen, C
Waibel, A
[J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 214 - 217
[9] Memory-efficient accelerating schedule for LDPC decoder
Shimizu, Kazunori
Togawa, Nozonm
Ikenaga, Takeshi
Goto, Satoshi
[J]. 2006 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, 2006, : 1317 - +
[10] A concurrent memory-efficient VLC decoder for MPEG applications
Hsieh, CT
Kim, SP
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1996, 42 (03) : 439 - 446

← 1 2 3 4 5 →