Memory-efficient LVCSR search using a one-pass stack decoder

被引:2
|
作者
Schuster, M [1 ]
机构
[1] ATR Interpreting Telecommun Res Labs, Kyoto 61902, Japan
来源
COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 01期
关键词
D O I
10.1006/csla.1999.0135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the details of a fast, memory-efficient one-pass stack decoder for efficient evaluation of the search space for large vocabulary continuous speech recognition. A modem efficient search engine is not based on a single idea, but is a rather complex collection of separate algorithms and practical implementation details, which only in combination make the search efficient in time and memory requirements. Being the core of a speech recognition system, the software design phase for a new decoder is often crucial for its later performance and flexibility. This paper tries to emphasize this point-after defining the requirements for a modem decoder, it describes the details of an implementation that is based on a stack decoder framework. It is shown how it is possible to handle arbitrary order N-grams, how to generate N-best lists or lattices next to the first-best hypothesis at little computational overhead, how to handle efficiently cross-word acoustic models of any context order, how to efficiently constrain the search with word graphs or word-pair grammars, and how to use a fast-match with delay to speed up the search, all in a single left-to-right search pass. The details of a disk-based representation of an N-gram language model are given, which make it possible to use language models (LMs) of arbitrary (file) size in only a few hundred kB of memory. On-demand N-gram smearing, an efficient improvement over the regular unigram smearing used as an approximation to the LM scores in a tree lexicon, is introduced. It is also shown how lattice rescoring, the generation of forced alignments and detailed phone-/state-alignments can efficiently be integrated into a single stack decoder. The decoder named "Nozomi"(1) was tested on a Japanese newspaper dictation task using a 5000 word vocabulary. Using computationally cheap models it is possible to achieve real-time performance with 89% word recognition accuracy at about 1% search error using only 4 MB of total memory on a 300 MHz Pentium II. With computationally more expensive acoustic models, which also cover for the Japanese language essential cross-word effects, more than 95% recognition accuracy(2) is reached. (C) 2000 Academic Press.
引用
收藏
页码:47 / 77
页数:31
相关论文
共 50 条
  • [21] A memory-efficient block-wise MAP decoder architecture
    Kim, S
    Hwang, SY
    Kang, MJ
    [J]. ETRI JOURNAL, 2004, 26 (06) : 615 - 621
  • [22] Design of low-power memory-efficient viterbi decoder
    Chen, Lupin
    He, Jinjin
    Wang, Zhongfeng
    [J]. 2007 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, VOLS 1 AND 2, 2007, : 132 - 135
  • [23] ONE-PASS CODE GENERATION USING CONTINUATIONS
    CLARKE, K
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1989, 19 (12): : 1175 - 1192
  • [24] Memory-Efficient Differentiable Transformer Architecture Search
    Zhao, Yuekai
    Dong, Li
    Shen, Yelong
    Zhang, Zhihua
    Wei, Furu
    Chen, Weizhu
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4254 - 4264
  • [25] Efficient one-pass 3-D time migration
    Brzostowski, MA
    Snyder, FFC
    Smith, PJ
    [J]. GEOPHYSICS, 1996, 61 (06) : 1833 - 1845
  • [26] Optimal One-Pass Nonparametric Estimation Under Memory Constraint
    Quan, Mingxue
    Lin, Zhenhua
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 285 - 296
  • [27] Efficient Certificate Based One-pass Authentication Protocol for IMS
    Ashraf, Humaira
    Ullah, Ata
    Tahira, Shireen
    Sher, Muhammad
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1133 - 1143
  • [28] FPGA implementation of a high-throughput memory-efficient LDPC decoder
    School of Electronic Engineering, Xidian Univ., Xi'an 710071, China
    不详
    [J]. Xi'an Dianzi Keji Daxue Xuebao, 2008, 3 (427-432):
  • [29] MEMORY-EFFICIENT PATH METRIC UPDATE METHOD IN MAP DECODER IMPLEMENTATION
    He Chun Hu Jianhao (National Key Lab. of Communications
    [J]. Journal of Electronics(China), 2008, (02) : 145 - 149
  • [30] A memory-efficient VLC decoder architecture for MPEG-2 application
    Min, KY
    Chong, JW
    [J]. 2000 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2000, : 43 - 49