An Innovative Prosody Modeling Method for Chinese Speech Recognition

被引:0
|
作者
Gang Peng
William S.-Y. Wang
机构
[1] City University of Hong Kong,Language Engineering Laboratory, Department of Electronic Engineering
关键词
Chinese dialects; speech recognition; prosody modeling; context-dependent;
D O I
10.1023/B:IJST.0000017013.70486.51
中图分类号
学科分类号
摘要
This paper presents an innovative method for prosody modeling in Chinese speech recognition. Our method first evaluated the reliability of the prosodic information by which the recognition system dynamically tunes the balance between the spectral scores and prosodic scores. The basic idea of this method is to use prosodic knowledge based on its reliability. The higher the reliability, the more the prosodic information contributes to recognition. Thus, this method will not introduce extra errors but will incorporate more knowledge into the recognition system. Experimental results showed that this method reduced the relative word error rate by as much as 52.9% and 46.0% for Mandarin and Cantonese digit string recognition tasks, respectively. When incorporating tone information into Cantonese Large Vocabulary Continuous Speech Recognition (LVCSR) via the proposed method, a 20.16% relative character error rate reduction was obtained.
引用
收藏
页码:129 / 140
页数:11
相关论文
共 50 条
  • [1] Prosody modeling for automatic speech recognition and understanding
    Shriberg, E
    Stolcke, A
    [J]. MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
  • [2] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
    Chiu, Tzu-Hsuan
    Chiang, Chen-Yu
    Liao, Yuan-Fu
    Yang, Jyh-Her
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142
  • [3] A novel statistical language modeling method for continuous Chinese speech recognition
    Tian, B
    Tian, HX
    Fu, Q
    Yi, KC
    [J]. ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 734 - 737
  • [4] Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features
    Zhang, Shiqing
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2008, PT 2, PROCEEDINGS, 2008, 5264 : 457 - 464
  • [5] Prosody Dependent Mandarin Speech Recognition
    Ni, Chong-Jia
    Liu, Wen-Ju
    Xu, Bo
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 197 - 201
  • [6] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
    Jia, Huibin
    Tao, Jianhua
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
  • [7] Fluent speech prosody: Framework and modeling
    Tseng, CY
    Pin, SH
    Lee, Y
    Wang, HM
    Chen, YC
    [J]. SPEECH COMMUNICATION, 2005, 46 (3-4) : 284 - 309
  • [8] Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition
    Ananthakrishnan, Sankaranarayanan
    Narayanan, Shrikanth
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01): : 138 - 149
  • [9] Automatic assessment of children’s oral reading using speech recognition and prosody modeling
    Kamini Sabu
    Preeti Rao
    [J]. CSI Transactions on ICT, 2018, 6 (2) : 221 - 225
  • [10] Using prosody to improve automatic speech recognition
    Vicsi, Klara
    Szaszak, Gyoergy
    [J]. SPEECH COMMUNICATION, 2010, 52 (05) : 413 - 426