Joint acoustic and language modeling for speech recognition

被引:24
|
作者
Chien, Jen-Tzung [1 ]
Chueh, Chuang-Hua [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
Hidden Markov model; n-Gram; Conditional random field; Maximum entropy; Discriminative training; Speech recognition; MAXIMUM-ENTROPY APPROACH;
D O I
10.1016/j.specom.2009.10.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In a traditional model of speech recognition, acoustic and linguistic information sources are assumed independent of each other. Parameters of hidden Markov model (HMM) and n-gram are separately estimated for maximum a posteriori classification. However, the speech features and lexical words are inherently correlated in natural language. Lacking combination of these models leads to some inefficiencies. This paper reports on the joint acoustic and linguistic modeling for speech recognition by using the acoustic evidence in estimation of the linguistic model parameters, and vice versa, according to the maximum entropy (ME) principle. The discriminative ME (DME) models are exploited by using features from competing sentences. Moreover, a mutual ME (MME) model is built for sentence posterior probability, which is maximized to estimate the model parameters by characterizing the dependence between acoustic and linguistic features. The N-best Viterbi approximation is presented in implementing DME and MME models. Additionally, the new models are incorporated with the high-order feature statistics and word regularities. In the experiments, the proposed methods increase the sentence posterior probability or model separation. Recognition errors are significantly reduced in comparison with separate HMM and n-gram model estimations from 32.2% to 27.4% using the MATBN corpus and from 5.4% to 4.8% using the WSJ corpus (5K condition). (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:223 / 235
页数:13
相关论文
共 50 条
  • [1] DISCRIMINATIVELY ESTIMATED JOINT ACOUSTIC, DURATION, AND LANGUAGE MODEL FOR SPEECH RECOGNITION
    Lehr, Maider
    Shafran, Izhak
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5542 - 5545
  • [2] Language-independent and language-adaptive acoustic modeling for speech recognition
    Schultz, T
    Waibel, A
    [J]. SPEECH COMMUNICATION, 2001, 35 (1-2) : 31 - 51
  • [3] JOINT MODELING OF ARTICULATORY AND ACOUSTIC SPACES FOR CONTINUOUS SPEECH RECOGNITION TASKS
    Mitra, Vikramjit
    Sivaraman, Ganesh
    BarteIs, Chris
    Nam, Hosung
    Wang, Wen
    Espy-Wilson, Carol
    Vergyri, Dimitra
    Franeo, Horacio
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5205 - 5209
  • [4] ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
    MATSUOKA, T
    MINAMI, Y
    [J]. NTT REVIEW, 1995, 7 (02): : 30 - 39
  • [5] RELEVANCE LANGUAGE MODELING FOR SPEECH RECOGNITION
    Chen, Kuan-Yu
    Chen, Berlin
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5568 - 5571
  • [6] ASYMMETRIC ACOUSTIC MODELING OF MIXED LANGUAGE SPEECH
    Li, Ying
    Fung, Pascale
    Xu, Ping
    Liu, Yi
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5004 - 5007
  • [7] Multidialectal Spanish acoustic modeling for speech recognition
    Caballero, Monica
    Moreno, Asuncion
    Nogueiras, Albino
    [J]. SPEECH COMMUNICATION, 2009, 51 (03) : 217 - 229
  • [8] Acoustic Modeling in Speech Recognition: A Systematic Review
    Bhatt, Shobha
    Jain, Anurag
    Dev, Amita
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (04) : 397 - 412
  • [9] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [10] Speech recognition based on unified model of acoustic and language aspects of speech
    [J]. 1600, Nippon Telegraph and Telephone Corp. (11):