Improved Tone Modeling for Mandarin Broadcast News Speech Recognition

被引:0
|
作者
Lei, Xin [1 ]
Siu, Manhung [2 ]
Hwang, Mei-Yuh [1 ]
Ostendorf, Mari [1 ]
Lee, Tan [3 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
[2] Hong Kong Univ Sci & Technol, EEE Dept, Clear Water Bay, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Shatin, Hong Kong, Peoples R China
关键词
speech recognition; Mandarin; tone modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F-0 features are appended to the spectral feature vector. In this paper, we combine the embedded aproach (using improved F0 smoothing) with explicit tone modeling in rescoring the output lattices. Oracle experiments indicate 32% relative improvement can be achieved by rescoring with perfect tone information. Recognition experiments on Mandarin broadcast news show that, even with an accuracy of only 70%, the explicit tone classifier offers complementary knowledge and improves performance significantly. Through the combination of tone modeling techniques, the character error rate on the CTV test set can be improved from 13.0% to 11.5%.
引用
收藏
页码:1237 / +
页数:2
相关论文
共 50 条
  • [1] A study on Mandarin broadcast news speech recognition
    Chen, CL
    Wang, YR
    Chen, SH
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 257 - 260
  • [2] Investigation on Mandarin Broadcast News Speech Recognition
    Hwang, Mei-Yuh
    Lei, Xin
    Wang, Wen
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1233 - +
  • [3] IMPROVED TONE MODELING BY EXPLOITING ARTICULATORY FEATURES FOR MANDARIN SPEECH RECOGNITION
    Chao, Hao
    Yang, Zhanlei
    Liu, Wenju
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4741 - 4744
  • [4] Tone Modeling for Continuous Mandarin Speech Recognition
    Cao, Yang
    Zhang, Shuwu
    Huang, Taiyi
    Xu, Bo
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 115 - 128
  • [5] Multifactor Adaptation for Mandarin Broadcast News and Conversation Speech Recognition
    Wang, Wen
    Mandal, Arindam
    Lei, Xin
    Stolcke, Andreas
    Zheng, Jing
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2099 - 2102
  • [6] Tone articulation modeling for mandarin spontaneous speech recognition
    Zhou, JL
    Tian, Y
    Shi, Y
    Huang, C
    Chang, E
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 997 - 1000
  • [7] MAXIMUM ENTROPY BASED TONE MODELING FOR MANDARIN SPEECH RECOGNITION
    Wang, Xinhao
    Yu, Yansuo
    Wu, Xihong
    Chi, Huisheng
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4850 - 4853
  • [8] WORD-LEVEL TONE MODELING FOR MANDARIN SPEECH RECOGNITION
    Lei, Xin
    Ostendorf, Mari
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 665 - +
  • [9] Advances in Mandarin Broadcast Speech Recognition
    Hwang, Mei-Yuh
    Wang, Wen
    Lei, Xin
    Zheng, Jing
    Cetin, Ozgur
    Peng, Gang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2876 - +
  • [10] Voice retrieval of Mandarin broadcast news speech
    Chen, B
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (01) : 91 - 109