Hierarchical prosody modeling for Mandarin spontaneous speech

被引:4
|
作者
Lin, Cheng-Hsien [1 ]
You, Chung-Long [1 ]
Chiang, Chen-Yu [2 ]
Wang, Yih-Ru [1 ]
Chen, Sin-Horng [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30010, Taiwan
[2] Natl Taipei Univ, Dept Commun Engn, New Taipei 23741, Taiwan
来源
关键词
AUTOMATIC DETECTION; INFORMATION; ADAPTATION; FRAMEWORK; FEATURES;
D O I
10.1121/1.5099263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a hierarchical prosody model (HPM)-based method for Mandarin spontaneous speech is proposed. First, an HPM is designed for describing relations among acoustic features of utterances, linguistic features of texts, and prosodic tags representing the underlying hierarchical prosodic structures of utterances. Subsequently, a sequential optimization algorithm is employed to train the HPM based on a large conversational speech corpus, the Mandarin Conversational Dialogue Corpus (MCDC), which features orthographic transcriptions and prosodic event annotations. In this unsupervised training method, all utterances of the MCDC are labeled with two types of prosodic tags, namely, break and prosodic states, automatically and simultaneously. After training, the HPM parameters are examined to identify critical prosodic properties of Mandarin spontaneous speech, which are then compared with their counterparts in the read-speech HPM. The prosodic tags on the studied utterances enable mapping of various prosodic events onto the hierarchical prosodic structures of the utterances. Prosodic analyses of some disfluent events are conducted using the prosodic tags affixed to the MCDC. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed. Significant relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% are obtained for base-syllable, character, tone, and word recognition, respectively. (C) 2019 Acoustical Society of America.
引用
收藏
页码:2576 / 2596
页数:21
相关论文
共 50 条
  • [1] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
    Jia, Huibin
    Tao, Jianhua
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
  • [2] Prosody for Mandarin Speech Recognition: a Comparative Study of Read and Spontaneous Speech
    Yeung, Yu Ting
    Qian, Yao
    Lee, Tan
    Soong, Frank K.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1133 - +
  • [3] ENRICHING MANDARIN SPEECH RECOGNITION BY INCORPORATING A HIERARCHICAL PROSODY MODEL
    Yang, Jyh-Her
    Liu, Ming-Chieh
    Chang, Hao-Hsiang
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5052 - 5055
  • [4] Unsupervised joint prosody labeling and modeling for Mandarin speech
    Chiang, Chen-Yu
    Chen, Sin-Horng
    Yu, Hsiu-Min
    Wang, Yih-Ru
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (02): : 1164 - 1183
  • [5] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
    Chiu, Tzu-Hsuan
    Chiang, Chen-Yu
    Liao, Yuan-Fu
    Yang, Jyh-Her
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142
  • [6] A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody
    Hsieh, Chiao-Hua
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Yu, Hsiu-Min
    Chen, Sin-Horng
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 654 - 657
  • [7] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chiang, Chen-Yu
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [8] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
    Pan, NH
    Jen, WT
    Yu, SS
    Yu, MS
    Huang, SY
    Wu, MJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
  • [9] Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
    Yi Liu
    Pascale Fung
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 155 - 172
  • [10] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chen-Yu Chiang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018