Memory-efficient LVCSR search using a one-pass stack decoder

被引:2
|
作者
Schuster, M [1 ]
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 61902, Japan
来源
COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 01期
关键词
D O I
10.1006/csla.1999.0135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the details of a fast, memory-efficient one-pass stack decoder for efficient evaluation of the search space for large vocabulary continuous speech recognition. A modem efficient search engine is not based on a single idea, but is a rather complex collection of separate algorithms and practical implementation details, which only in combination make the search efficient in time and memory requirements. Being the core of a speech recognition system, the software design phase for a new decoder is often crucial for its later performance and flexibility. This paper tries to emphasize this point-after defining the requirements for a modem decoder, it describes the details of an implementation that is based on a stack decoder framework. It is shown how it is possible to handle arbitrary order N-grams, how to generate N-best lists or lattices next to the first-best hypothesis at little computational overhead, how to handle efficiently cross-word acoustic models of any context order, how to efficiently constrain the search with word graphs or word-pair grammars, and how to use a fast-match with delay to speed up the search, all in a single left-to-right search pass. The details of a disk-based representation of an N-gram language model are given, which make it possible to use language models (LMs) of arbitrary (file) size in only a few hundred kB of memory. On-demand N-gram smearing, an efficient improvement over the regular unigram smearing used as an approximation to the LM scores in a tree lexicon, is introduced. It is also shown how lattice rescoring, the generation of forced alignments and detailed phone-/state-alignments can efficiently be integrated into a single stack decoder. The decoder named "Nozomi"(1) was tested on a Japanese newspaper dictation task using a 5000 word vocabulary. Using computationally cheap models it is possible to achieve real-time performance with 89% word recognition accuracy at about 1% search error using only 4 MB of total memory on a 300 MHz Pentium II. With computationally more expensive acoustic models, which also cover for the Japanese language essential cross-word effects, more than 95% recognition accuracy(2) is reached. (C) 2000 Academic Press.
引用
收藏
页码:47 / 77
页数:31
相关论文
共 50 条
  • [1] A one-pass real-time decoder using memory-efficient state network
    Shao, Jian
    Li, Ta
    Zhang, Qingqing
    Zhao, Qingwei
    Yan, Yonghong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 529 - 537
  • [2] Memory-Efficient Batch Normalization by One-Pass Computation for On-Device Training
    Dai, He
    Wang, Hang
    Zhang, Xuchong
    Sun, Hongbin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3186 - 3190
  • [3] Efficient evaluation of the LVCSR search space using the NOWAY decoder
    Renals, S
    Hochberg, M
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 149 - 152
  • [5] One-pass LVCSR algorithm using linear lexicon search and 1-best approximation tree-structured lexicon search
    Kitaoka, Norihide
    Liang, Ying
    Takahashi, Nobutoshi
    Nakagawa, Seiichi
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 600 - +
  • [6] A memory-efficient progressive JPEG decoder
    Lee, Kun-Bin
    Ju, Chi-Cheng
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PAPERS, 2007, : 8 - +
  • [7] Efficient One-Pass Chase Soft-Decision BCH Decoder for Multi-Level Cell NAND Flash Memory
    Zhang, Xinmiao
    Zhu, Jiangli
    Wu, Yingquan
    [J]. 2011 IEEE 54TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2011,
  • [8] A one-pass decoder based on polymorphic linguistic context assignment
    Soltau, H
    Metze, F
    Fügen, C
    Waibel, A
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 214 - 217
  • [9] Memory-efficient accelerating schedule for LDPC decoder
    Shimizu, Kazunori
    Togawa, Nozonm
    Ikenaga, Takeshi
    Goto, Satoshi
    [J]. 2006 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, 2006, : 1317 - +
  • [10] A concurrent memory-efficient VLC decoder for MPEG applications
    Hsieh, CT
    Kim, SP
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1996, 42 (03) : 439 - 446