A multispan language modeling framework for large vocabulary speech recognition

被引:0
|
作者
Bellegarda, JR [1 ]
机构
[1] Apple Comp Inc, Spoken Language Grp, Cupertino, CA 95014 USA
来源
关键词
latent semantic analysis; n-gram adaptation; perplexity reduction; statistical language modeling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new framework is proposed to construct multispan language models for large vocabulary speech recognition, by exploiting both local and global constraints present in the language. While statistical n-gram modeling can readily take local constraints into account, global constraints have been more difficult to handle within a data-driven formalism, In this work, they are captured via a paradigm first formulated in the context of information retrieval, called latent semantic analysis (LSA), This paradigm seeks to automatically uncover the salient semantic relationships between words and documents in a given corpus, Such discovery relies on a parsimonious vector representation of each word and each document in a suitable, common vector space, Since in this space familiar clustering techniques can be applied, it becomes possible to derive several families of large-span language models, with various smoothing properties, Because of their semantic nature, the new language models are well suited to complement conventional, more syntactically oriented n-grams, and the combination of the two paradigms naturally yields the benefit of a multispan context, An integrative formulation is proposed for this purpose, in which the latent semantic information is used to adjust the standard n-gram probability, The performance of the resulting multispan language models, as measured by perplexity, compares favorably with the corresponding n-gram performance.
引用
收藏
页码:456 / 467
页数:12
相关论文
共 50 条
  • [1] Large vocabulary speech recognition with multispan statistical language models
    Bellegarda, JR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 76 - 84
  • [2] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [3] Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
    Pakoci, Edvin
    Popovic, Branislav
    Pekar, Darko
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [4] Statistical language modeling with semantic classes for large vocabulary speech recognition in embedded systems
    Oria, Daniela
    Olsen, Jesper
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 496 - +
  • [5] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
    Karpov, Alexey
    Markov, Konstantin
    Kipyatkova, Irina
    Vazhenina, Dania
    Ronzhin, Andrey
    [J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
  • [6] Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
    Sun, Ri Hyon
    Chol, Ri Jong
    [J]. SPEECH COMMUNICATION, 2020, 117 : 21 - 27
  • [7] A tutorial on pronunciation modeling for large vocabulary speech recognition
    Fosler-Lussier, E
    [J]. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, 2003, 2705 : 38 - 77
  • [8] Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition
    Huang, Jui-Ting
    Lee, Lin-shan
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1241 - 1244
  • [9] Language identification through large vocabulary continous speech recognition
    Lim, BP
    Li, HZ
    Chen, Y
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 49 - 52
  • [10] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12