DATA DRIVEN SEARCH ORGANIZATION FOR CONTINUOUS SPEECH RECOGNITION

被引:38
|
作者
NEY, H [1 ]
MERGEL, D [1 ]
NOLL, A [1 ]
PAESELER, A [1 ]
机构
[1] ASPECT GMBH,W-2000 NORDERSTEDT,GERMANY
关键词
D O I
10.1109/78.124938
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes an architecture and search organization for continuous speech recognition. The recognition module is part of the SPICOS system for the understanding of data base queries spoken in natural language. The recognition is based on statistical decision theory and thus amounts to an integrated approach that combines all available knowledge sources, such as inventory of subword units, pronunciation lexicon, and language model, and attempts to avoid local decisions during the process of acoustic recognition. The recognition decision amounts to a time-synchronous, left-to-right search through a large state space with delayed decisions. The recognized word sequence is then the best interpretation of the observed acoustic data within the constraints as given by the knowledge sources. The organization of the search can be viewed as an extension of the one-pass dynamic programming algorithm for connected word recognition. In continuous speech recognition, however, the search space is much larger, and an efficient organization of the search process is called for in order to keep the organization overhead as small as possible. In this paper, we present such an efficient search organization with the following characteristics. Its computational cost is proportional only to the number of hypotheses actually generated and is independent of the overall size of the potential search space. There is no limit to the number of word hypotheses, there is only a limit to the overall number of hypotheses due to storage constraints. The implementation of the search has been tested on a continuous speech data base comprising up to 4000 words for each of several speakers. In particular, the efficiency and robustness of the search organization has been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models.
引用
收藏
页码:272 / 281
页数:10
相关论文
共 50 条
  • [1] Search organization in the whisper continuous speech recognition system
    Alleva, F
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 295 - 302
  • [2] Segmental search for continuous speech recognition
    Laface, P
    Fissore, L
    Maro, A
    Ravera, F
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2155 - 2158
  • [3] A CONTINUOUS SPEECH RECOGNITION ALGORITHM UTILIZING ISLAND-DRIVEN A-ASTERISK SEARCH
    YAMAGUCHI, Y
    OGIHARA, A
    HAYASHI, Y
    TAKASU, N
    FUKUNAGA, K
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (07) : 1184 - 1186
  • [4] Dynamic programming search for continuous speech recognition
    Ney, H
    Ortmanns, S
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 1999, 16 (05) : 64 - 83
  • [5] Fast search algorithms for continuous speech recognition
    Zhao, J
    Hamaker, J
    Deshmukh, N
    Ganapathiraju, A
    Picone, J
    [J]. IEEE SOUTHEASTCON '99, PROCEEDINGS, 1999, : 36 - 39
  • [6] Data-driven approach to designing compound words for continuous speech recognition
    Saon, G
    Padmanabhan, M
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 327 - 332
  • [7] A minimax search algorithm for robust continuous speech recognition
    Jiang, H
    Hirose, K
    Huo, Q
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 688 - 694
  • [8] Improvement in N-best search for continuous speech recognition
    Illina, I
    Gong, YF
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2147 - 2150
  • [9] RECOGNITION OF CONTINUOUS SPEECH USING AN, ISLAND-DRIVEN STRATEGY
    MARI, JF
    [J]. RAIRO-INFORMATIQUE-COMPUTER SCIENCE, 1981, 15 (02): : 167 - 196
  • [10] BOOKS ON TAPE AS TRAINING DATA FOR CONTINUOUS SPEECH RECOGNITION
    BOULIANNE, G
    KENNY, P
    LENNIG, M
    OSHAUGHNESSY, D
    MERMELSTEIN, P
    [J]. SPEECH COMMUNICATION, 1994, 14 (01) : 61 - 70