Context dependent initial/final acoustic modeling for continuous Chinese speech recognition

被引:0
|
作者
Li, Jing [1 ]
Zheng, Fang [1 ,2 ]
Zhang, Jiyong [1 ]
Wu, Wenhu [1 ]
机构
[1] Lab. of Intelligent Technol., Dept. of Comp. Sci. and Technol., Tsinghua Univ., Beijing 100084, China
[2] Beijing d-Ear Technol. Co. Ltd., Beijing 100085, China
关键词
Decision support systems - Mathematical models;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling is important for continuous Chinese speech recognition. The extended initial/final (XIF) set is the basic speech recognition unit set to analyze the Chinese language characteristics outperformed the standard IF set. Decision tree-based state tying technology was used to construct the content dependent initial/final acoustic model (Tri-XIF model), with an appropriate question set design based on Chinese linguistic knowledge. The methods were developed to optimize the Tri-XIF modeling, including transcription refinement, question set extension, and model size reduction. Tests show that the Tri-XIF modeling is much better than either Tri-phone modeling or syllable modeling, the syllable error rate reduced by 24.53% comparing to the Tri-phone model and 41.65% to syllable model. The model size was reduced by above 20% with little performance deterioration using the methods in the Tri-XIF model.
引用
收藏
页码:61 / 64
相关论文
共 50 条
  • [1] Context Dependent Syllable Acoustic Model for Continuous Chinese Speech Recognition
    Wu, Hao
    Wu, Xihong
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1961 - 1964
  • [2] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [3] Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese
    Wu, JJX
    Deng, L
    Chan, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2281 - 2284
  • [4] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    [J]. IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [5] Context modeling and clustering in continuous speech recognition
    Junqua, JC
    Vassallo, L
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2262 - 2265
  • [6] Automatic Initial/Final Generation for Dialectal Chinese Speech Recognition
    Liu, Linquan
    Zheng, Thomas Fang
    Wu, Wenhu
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 141 - 144
  • [7] Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition
    Kanthak, S
    Ney, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 845 - 848
  • [9] Phone-context specific gender-dependent acoustic-models for continuous speech recognition
    Neti, C
    Roukos, S
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 192 - 198
  • [10] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
    Chiu, Tzu-Hsuan
    Chiang, Chen-Yu
    Liao, Yuan-Fu
    Yang, Jyh-Her
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142