New features in the CU-HTK system for transcription of conversational telephone speech

被引:0
|
作者
Hain, T [1 ]
Woodland, PC [1 ]
Evermann, G [1 ]
Povey, D [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper discusses new features integrated into the Cambridge University HTK (CU-HTK) system for the transcription of conversational telephone speech. Major improvements have been achieved by the use of maximum mutual information estimation in training as well as maximum likelihood estimation; the use of a full variance transform for adaptation; the inclusion of unigram pronunciation probabilities; and word-level posterior probability estimation using confusion networks for use in minimum word error rate decoding, confidence score estimation and system combination. Improvements are demonstrated via performance on the NIST March 2000 evaluation of English conversational telephone speech transcription (Hub5E). In this evaluation the CU-HTK system gave an overall word error rate of 25.4%, which was the best performance by a statistically significant margin.
引用
收藏
页码:57 / 60
页数:4
相关论文
共 21 条
  • [1] Development of the 2003 CU-HTK Conversational Telephone Speech transcription system
    Evermann, G
    Chan, HY
    Gales, MJF
    Hain, T
    Liu, X
    Mrva, D
    Wang, L
    Woodland, P
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 249 - 252
  • [2] The 1998 HTK system for transcription of conversational telephone speech
    Hain, T
    Woodland, PC
    Niesler, TR
    Whittaker, EWD
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 57 - 60
  • [3] 1998 HTK system for transcription of conversational telephone speech
    Hain, T.
    Woodland, P.C.
    Niesler, T.R.
    Whittaker, E.W.D.
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 57 - 60
  • [4] The CU-HTK Mandarin Broadcast News transcription system
    Sinha, R.
    Gales, M. J. F.
    Kim, D. Y.
    Liu, X. A.
    Sim, K. C.
    Woodland, P. C.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1077 - 1080
  • [5] Progress in the CU-HTK broadcast news transcription system
    Gales, Mark J. F.
    Kim, Do Yeong
    Woodland, Philip C.
    Chan, Ho Yin
    Mrva, David
    Sinha, Rohit
    Tranter, Sue E.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1513 - 1525
  • [6] Development of the CU-HTK 2004 broadcast news transcription systems
    Kim, DY
    Chan, HY
    Evermann, G
    Gales, MJF
    Mrva, D
    Sim, KC
    Woodland, P
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 861 - 864
  • [7] Automatic transcription of conversational telephone speech
    Hain, T
    Woodland, PC
    Evermann, G
    Gales, MJF
    Liu, XY
    Moore, GL
    Povey, D
    Wang, L
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (06): : 1173 - 1185
  • [8] Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system
    Gales, MJF
    Jia, B
    Liu, X
    Sim, KC
    Woodland, P
    Yu, K
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 841 - 844
  • [9] Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
    Hartmann, William
    Hsiao, Roger
    Ng, Tim
    Ma, Jeff
    Keith, Francis
    Siu, Man-Hung
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 112 - 116
  • [10] Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system
    Matsoukas, Spyros
    Gauvain, Jean-Luc
    Adda, Gilles
    Colthurst, Thomas
    Kao, Chia-Lin
    Kimball, Owen
    Lamel, Lori
    Lefevre, Fabrice
    Ma, Jeff Z.
    Makhoul, John
    Nguyen, Long
    Prasad, Rohit
    Schwartz, Richard
    Schwenk, Holger
    Xiang, Bing
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1541 - 1556