Joint acoustic and language modeling for speech recognition

被引:24
|
作者
Chien, Jen-Tzung [1 ]
Chueh, Chuang-Hua [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
Hidden Markov model; n-Gram; Conditional random field; Maximum entropy; Discriminative training; Speech recognition; MAXIMUM-ENTROPY APPROACH;
D O I
10.1016/j.specom.2009.10.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In a traditional model of speech recognition, acoustic and linguistic information sources are assumed independent of each other. Parameters of hidden Markov model (HMM) and n-gram are separately estimated for maximum a posteriori classification. However, the speech features and lexical words are inherently correlated in natural language. Lacking combination of these models leads to some inefficiencies. This paper reports on the joint acoustic and linguistic modeling for speech recognition by using the acoustic evidence in estimation of the linguistic model parameters, and vice versa, according to the maximum entropy (ME) principle. The discriminative ME (DME) models are exploited by using features from competing sentences. Moreover, a mutual ME (MME) model is built for sentence posterior probability, which is maximized to estimate the model parameters by characterizing the dependence between acoustic and linguistic features. The N-best Viterbi approximation is presented in implementing DME and MME models. Additionally, the new models are incorporated with the high-order feature statistics and word regularities. In the experiments, the proposed methods increase the sentence posterior probability or model separation. Recognition errors are significantly reduced in comparison with separate HMM and n-gram model estimations from 32.2% to 27.4% using the MATBN corpus and from 5.4% to 4.8% using the WSJ corpus (5K condition). (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:223 / 235
页数:13
相关论文
共 50 条
  • [31] Bayesian Neural Network Language Modeling for Speech Recognition
    Xue, Boyang
    Hu, Shoukang
    Xu, Junhao
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2900 - 2917
  • [32] Syllable modeling in continuous speech recognition for Tamil language
    Thangarajan, R.
    Natarajan, A.
    Selvam, M.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2009, 12 (01) : 47 - 57
  • [33] Leveraging relevance cues for language modeling in speech recognition
    Chen, Berlin
    Chen, Kuan-Yu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 807 - 816
  • [34] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [35] Acoustic and Language Modeling for Children's Read Speech Assessment
    Tulsiani, Hitesh
    Swarup, Prakhar
    Rao, Preeti
    [J]. 2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [36] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [37] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [38] CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Haws, David
    Cui, Xiaodong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6780 - 6784
  • [39] A study on acoustic modeling for speech recognition of predominantly monosyllabic languages
    Maneenoi, E
    Ahkuputra, V
    Luksaneeyanawin, S
    Jitapunkul, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1146 - 1163
  • [40] Acoustic Modeling Based on Model Structure Annealing for Speech Recognition
    Shiota, Sayaka
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 932 - 935