Using prosody to improve Mandarin automatic speech recognition

被引：0

作者：

Ni, Chong-Jia ^{[1
]}

Liu, Wen-Ju ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

automatic speech recognition; prosody; MSD-HSMM; Maximum Entropy; CORPUS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal syllable model aiming at data sparse problem after prosody labeling. In this paper, we also utilize MSD-HSMM to model pitch, duration etc. influence factors of prosody, and at the same time, we unite MSD-HSMM model, prosody dependent tonal syllable duration model based on GMM and syntactical prosody model based on Maximum Entropy to decode. When compared with the baseline system, the performance of our prosody dependent speech recognition systems improves the correct rate of tonal syllable significantly.

引用

页码：2698 / 2701

页数：4

共 50 条

[1] Using prosody to improve automatic speech recognition
Vicsi, Klara
Szaszak, Gyoergy
SPEECH COMMUNICATION, 2010, 52 (05) : 413 - 426
[2] Prosody Dependent Mandarin Speech Recognition
Ni, Chong-Jia
Liu, Wen-Ju
Xu, Bo
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 197 - 201
[3] An Automatic Prosody Labeling Method for Mandarin Speech
Chiang, Chen-Yu
Yu, Hsiu-Min
Wang, Yih-Ru
Chen, Sin-Horng
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 725 - +
[4] Prosody for Mandarin Speech Recognition: a Comparative Study of Read and Spontaneous Speech
Yeung, Yu Ting
Qian, Yao
Lee, Tan
Soong, Frank K.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1133 - +
[5] Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
Szaszak, Gyorgy
Tundik, Mate Akos
Beke, Andras
KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 221 - 227
[6] ENRICHING MANDARIN SPEECH RECOGNITION BY INCORPORATING A HIERARCHICAL PROSODY MODEL
Yang, Jyh-Her
Liu, Ming-Chieh
Chang, Hao-Hsiang
Chiang, Chen-Yu
Wang, Yih-Ru
Chen, Sin-Horng
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5052 - 5055
[7] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
Chiu, Tzu-Hsuan
Chiang, Chen-Yu
Liao, Yuan-Fu
Yang, Jyh-Her
Wang, Yih-Ru
Chen, Sin-Horng
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142
[8] Prosody modeling for automatic speech recognition and understanding
Shriberg, E
Stolcke, A
MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
[9] The Automatic Analysis by Synthesis of Speech Prosody with Preliminary Results on Mandarin Chinese
Hirst, Daniel
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXIV - XXIV
[10] Automatic Emotion Recognition of Speech Signal in Mandarin
Zhang, Sheng
Ching, P. C.
Kong, Fanrang
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1810 - +

← 1 2 3 4 5 →