Large vocabulary speech recognition with multispan statistical language models

被引:42
|
作者
Bellegarda, JR [1 ]
机构
[1] Apple Comp Inc, Spoken Language Grp, Cupertino, CA 95014 USA
来源
关键词
latent semantic analysis; multispan integration; n-grams; speech recognition; statistical language modeling;
D O I
10.1109/89.817455
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multispan language modeling refers to the integration of the various constraints, both local and global, present in the language. It was recently proposed to capture global constraints through the use of latent semantic analysis, while taking local constraints into account via the usual n-gram approach. This has led to several families of data-driven, multispan language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the multispan performance, as measured by perplexity, has been shown to compare favorably with the corresponding n-gram performance, The objective of this work is to characterize the behavior of such multispan modeling in actual recognition. Major implementation issues are addressed, including search integration and context scope selection. Experiments are conducted on a subset of the Wall Street Journal (WSJ) speaker-independent, 20000-word vocabulary, continuous speech task. Results show that, compared to standard n-gram, the multispan framework can lead to a reduction in average word error rate of over 20%. The paper concludes with a discussion of intrinsic multi-span tradeoffs, such as the influence of training data selection on the resulting performance.
引用
收藏
页码:76 / 84
页数:9
相关论文
共 50 条
  • [1] A multispan language modeling framework for large vocabulary speech recognition
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 456 - 467
  • [2] Large vocabulary speech recognition of Slovenian language using morphological models
    Maucec, M
    Rotovnik, T
    Kacic, Z
    Horvat, B
    [J]. IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 158 - 161
  • [3] Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition
    Yazgan, A
    Saraclar, M
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 745 - 748
  • [4] Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish
    Vanhainen, Niklas
    Salvi, Giampiero
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [5] Statistical language modeling with semantic classes for large vocabulary speech recognition in embedded systems
    Oria, Daniela
    Olsen, Jesper
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 496 - +
  • [6] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
    Karpov, Alexey
    Markov, Konstantin
    Kipyatkova, Irina
    Vazhenina, Dania
    Ronzhin, Andrey
    [J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
  • [7] Boosting acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    [J]. PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260
  • [8] Language identification through large vocabulary continous speech recognition
    Lim, BP
    Li, HZ
    Chen, Y
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 49 - 52
  • [9] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [10] A large vocabulary continuous speech recognition system for Persian language
    Hossein Sameti
    Hadi Veisi
    Mohammad Bahrani
    Bagher Babaali
    Khosro Hosseinzadeh
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011