The IBM mandarin broadcast speech transcription system

被引:0
|
作者
Chu, Stephen M. [1 ]
Kuo, Hong-kwang [1 ]
Liu, Yi Y. [2 ]
Qin, Yong [2 ]
Shi, Qin [2 ]
Zweig, Geoffrey [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Corp, China Res Lab, Beijing, Peoples R China
关键词
speech recognition; speech processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes the technical and system building advances in the automatic transcription of Mandarin broadcast speech made at IBM in the first year of the DARPA GALE program. In particular, we discuss the application of minimum phone error (MPE) discriminative training and a new topic-adaptive language modeling technique. We present results on both the RT04 evaluation data and two larger community-defined test sets designed to cover both the broadcast news and the broadcast conversation domain. It is shown that with the described advances, the new transcription system achieves a 26.3% relative reduction in character error rate over our previous best-performing system, and is competitive with published numbers on these datasets.
引用
收藏
页码:345 / +
页数:2
相关论文
共 50 条
  • [1] THE 2009 IBM GALE MANDARIN BROADCAST TRANSCRIPTION SYSTEM
    Chu, Stephen M.
    Povey, Daniel
    Kuo, Hong-Kwang
    Mangu, Lidia
    Zhang, Shilei
    Shi, Qin
    Qin, Yong
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4374 - 4377
  • [2] Advances in mandarin broadcast speech transcription at IBM under the DARPA GALE program
    Qin, Yong
    Shi, Qin
    Liu, Yi Y.
    Aronowitz, Hagai
    Chu, Stephen M.
    Kuo, Hong-Kwang
    Zweig, Geoffrey
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 410 - +
  • [3] Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
    Chen, SS
    Eide, EM
    Gales, MJF
    Gopinath, RA
    Kanevsky, D
    Olsen, P
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 37 - 40
  • [4] Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
    Liu, X. A.
    Byrne, W. J.
    Gales, M. J. F.
    de Gispert, A.
    Tomalin, M.
    Woodland, P. C.
    Yu, K.
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 153 - 158
  • [5] A mandarin lecture speech transcription system for speech summarization
    Chan, Ho Yin
    Zhang, Justin Jian
    Fung, Pascale
    Cao, Lu
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 467 - 471
  • [6] The IBM BOLT Speech Transcription System
    Thomas, Samuel
    Saon, George
    Kuo, Hong-Kwang
    Mangu, Lidia
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3150 - 3153
  • [7] Recent advances in the IBM GALE Mandarin transcription system
    Chu, Stephen M.
    Kuo, Rong-kwang
    Mangu, Lidia
    Liu, Ji
    Qin, Yong
    Shi, Qin
    Zhang, Shi Lei
    Aronowitz, Hagai
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4329 - 4332
  • [8] The CU-HTK Mandarin Broadcast News transcription system
    Sinha, R.
    Gales, M. J. F.
    Kim, D. Y.
    Liu, X. A.
    Sim, K. C.
    Woodland, P. C.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1077 - 1080
  • [9] Advances in Mandarin Broadcast Speech Recognition
    Hwang, Mei-Yuh
    Wang, Wen
    Lei, Xin
    Zheng, Jing
    Cetin, Ozgur
    Peng, Gang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2876 - +
  • [10] THE IBM 2009 GALE ARABIC SPEECH TRANSCRIPTION SYSTEM
    Kingsbury, Brian
    Soltau, Hagen
    Saon, George
    Chu, Stephen
    Kuo, Hong-Kwang
    Mangu, Lidia
    Ravuri, Suman
    Morgan, Nelson
    Janin, Adam
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4672 - 4675