Hierarchical prosody modeling for Mandarin spontaneous speech

被引：4

作者：

Lin, Cheng-Hsien ^{[1
]}

You, Chung-Long ^{[1
]}

Chiang, Chen-Yu ^{[2
]}

Wang, Yih-Ru ^{[1
]}

Chen, Sin-Horng ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30010, Taiwan

[2] Natl Taipei Univ, Dept Commun Engn, New Taipei 23741, Taiwan

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2019年 / 145卷 / 04期

关键词：

AUTOMATIC DETECTION; INFORMATION; ADAPTATION; FRAMEWORK; FEATURES;

D O I：

10.1121/1.5099263

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, a hierarchical prosody model (HPM)-based method for Mandarin spontaneous speech is proposed. First, an HPM is designed for describing relations among acoustic features of utterances, linguistic features of texts, and prosodic tags representing the underlying hierarchical prosodic structures of utterances. Subsequently, a sequential optimization algorithm is employed to train the HPM based on a large conversational speech corpus, the Mandarin Conversational Dialogue Corpus (MCDC), which features orthographic transcriptions and prosodic event annotations. In this unsupervised training method, all utterances of the MCDC are labeled with two types of prosodic tags, namely, break and prosodic states, automatically and simultaneously. After training, the HPM parameters are examined to identify critical prosodic properties of Mandarin spontaneous speech, which are then compared with their counterparts in the read-speech HPM. The prosodic tags on the studied utterances enable mapping of various prosodic events onto the hierarchical prosodic structures of the utterances. Prosodic analyses of some disfluent events are conducted using the prosodic tags affixed to the MCDC. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed. Significant relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% are obtained for base-syllable, character, tone, and word recognition, respectively. (C) 2019 Acoustical Society of America.

引用

页码：2576 / 2596

页数：21

共 50 条

[1] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
Jia, Huibin
Tao, Jianhua
[J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
[2] Prosody for Mandarin Speech Recognition: a Comparative Study of Read and Spontaneous Speech
Yeung, Yu Ting
Qian, Yao
Lee, Tan
Soong, Frank K.
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1133 - +
[3] ENRICHING MANDARIN SPEECH RECOGNITION BY INCORPORATING A HIERARCHICAL PROSODY MODEL
Yang, Jyh-Her
Liu, Ming-Chieh
Chang, Hao-Hsiang
Chiang, Chen-Yu
Wang, Yih-Ru
Chen, Sin-Horng
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5052 - 5055
[4] Unsupervised joint prosody labeling and modeling for Mandarin speech
Chiang, Chen-Yu
Chen, Sin-Horng
Yu, Hsiu-Min
Wang, Yih-Ru
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (02): : 1164 - 1183
[5] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
Chiu, Tzu-Hsuan
Chiang, Chen-Yu
Liao, Yuan-Fu
Yang, Jyh-Her
Wang, Yih-Ru
Chen, Sin-Horng
[J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142
[6] A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody
Hsieh, Chiao-Hua
Chiang, Chen-Yu
Wang, Yih-Ru
Yu, Hsiu-Min
Chen, Sin-Horng
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 654 - 657
[7] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chiang, Chen-Yu
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[8] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
Pan, NH
Jen, WT
Yu, SS
Yu, MS
Huang, SY
Wu, MJ
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
[9] Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
Yi Liu
Pascale Fung
[J]. International Journal of Speech Technology, 2004, 7 (2-3) : 155 - 172
[10] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chen-Yu Chiang
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018

← 1 2 3 4 5 →