Context dependent initial/final acoustic modeling for continuous Chinese speech recognition

被引:0
|
作者
Li, Jing [1 ]
Zheng, Fang [1 ,2 ]
Zhang, Jiyong [1 ]
Wu, Wenhu [1 ]
机构
[1] Lab. of Intelligent Technol., Dept. of Comp. Sci. and Technol., Tsinghua Univ., Beijing 100084, China
[2] Beijing d-Ear Technol. Co. Ltd., Beijing 100085, China
关键词
Decision support systems - Mathematical models;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling is important for continuous Chinese speech recognition. The extended initial/final (XIF) set is the basic speech recognition unit set to analyze the Chinese language characteristics outperformed the standard IF set. Decision tree-based state tying technology was used to construct the content dependent initial/final acoustic model (Tri-XIF model), with an appropriate question set design based on Chinese linguistic knowledge. The methods were developed to optimize the Tri-XIF modeling, including transcription refinement, question set extension, and model size reduction. Tests show that the Tri-XIF modeling is much better than either Tri-phone modeling or syllable modeling, the syllable error rate reduced by 24.53% comparing to the Tri-phone model and 41.65% to syllable model. The model size was reduced by above 20% with little performance deterioration using the methods in the Tri-XIF model.
引用
收藏
页码:61 / 64
相关论文
共 50 条
  • [31] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [32] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
    Bacchiani, Michiel
    Rybach, David
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] CONTINUOUS SPEECH RECOGNITION VIA CENTISECOND ACOUSTIC STATES
    BAKIS, R
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S97 - S97
  • [34] Initial evaluation of a continuous speech recognition program for radiology
    KM Kanal
    NJ Hangiandreou
    AM Sykes
    HE Eklund
    PA Araoz
    JA Leon
    BJ Erickson
    [J]. Journal of Digital Imaging, 2001, 14 : 30 - 37
  • [35] Initial evaluation of a continuous speech recognition program for radiology
    Kanal, KM
    Hangiandreou, NJ
    Sykes, AMG
    Eklund, HE
    Araoz, PA
    Leon, JA
    Erickson, BJ
    [J]. JOURNAL OF DIGITAL IMAGING, 2001, 14 (01) : 30 - 37
  • [36] A new combined modeling of continuous speech recognition
    Han, ZB
    Jia, L
    Zhang, S
    Xu, B
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 597 - 602
  • [37] CONTINUOUS TOPIC LANGUAGE MODELING FOR SPEECH RECOGNITION
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 193 - 196
  • [38] Improved lexicon modeling for continuous speech recognition
    Yun, SJ
    Oh, YH
    Shin, GC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1827 - 1830
  • [39] Tone Modeling for Continuous Mandarin Speech Recognition
    Cao, Yang
    Zhang, Shuwu
    Huang, Taiyi
    Xu, Bo
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 115 - 128
  • [40] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151