Context dependent initial/final acoustic modeling for continuous Chinese speech recognition

被引：0

作者：

Li, Jing ^{[1
]}

Zheng, Fang ^{[1
,2
]}

Zhang, Jiyong ^{[1
]}

Wu, Wenhu ^{[1
]}

机构：

[1] Lab. of Intelligent Technol., Dept. of Comp. Sci. and Technol., Tsinghua Univ., Beijing 100084, China

[2] Beijing d-Ear Technol. Co. Ltd., Beijing 100085, China

来源：

Qinghua Daxue Xuebao/Journal of Tsinghua University | 2004年 / 44卷 / 01期

关键词：

Decision support systems - Mathematical models;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Acoustic modeling is important for continuous Chinese speech recognition. The extended initial/final (XIF) set is the basic speech recognition unit set to analyze the Chinese language characteristics outperformed the standard IF set. Decision tree-based state tying technology was used to construct the content dependent initial/final acoustic model (Tri-XIF model), with an appropriate question set design based on Chinese linguistic knowledge. The methods were developed to optimize the Tri-XIF modeling, including transcription refinement, question set extension, and model size reduction. Tests show that the Tri-XIF modeling is much better than either Tri-phone modeling or syllable modeling, the syllable error rate reduced by 24.53% comparing to the Tri-phone model and 41.65% to syllable model. The model size was reduced by above 20% with little performance deterioration using the methods in the Tri-XIF model.

引用

页码：61 / 64

共 50 条

[31] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
Cui, Xiaodong
Lu, Songtao
Kingsbury, Brian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
[32] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
Bacchiani, Michiel
Rybach, David
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[33] CONTINUOUS SPEECH RECOGNITION VIA CENTISECOND ACOUSTIC STATES
BAKIS, R
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S97 - S97
[34] Initial evaluation of a continuous speech recognition program for radiology
KM Kanal
NJ Hangiandreou
AM Sykes
HE Eklund
PA Araoz
JA Leon
BJ Erickson
[J]. Journal of Digital Imaging, 2001, 14 : 30 - 37
[35] Initial evaluation of a continuous speech recognition program for radiology
Kanal, KM
Hangiandreou, NJ
Sykes, AMG
Eklund, HE
Araoz, PA
Leon, JA
Erickson, BJ
[J]. JOURNAL OF DIGITAL IMAGING, 2001, 14 (01) : 30 - 37
[36] A new combined modeling of continuous speech recognition
Han, ZB
Jia, L
Zhang, S
Xu, B
[J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 597 - 602
[37] CONTINUOUS TOPIC LANGUAGE MODELING FOR SPEECH RECOGNITION
Chueh, Chuang-Hua
Chien, Jen-Tzung
[J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 193 - 196
[38] Improved lexicon modeling for continuous speech recognition
Yun, SJ
Oh, YH
Shin, GC
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1827 - 1830
[39] Tone Modeling for Continuous Mandarin Speech Recognition
Cao, Yang
Zhang, Shuwu
Huang, Taiyi
Xu, Bo
[J]. International Journal of Speech Technology, 2004, 7 (2-3) : 115 - 128
[40] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151

← 1 2 3 4 5 →